2025-12-04T11:12:20.8767842Z Current runner version: '2.329.0' 2025-12-04T11:12:20.8770761Z Runner name: 'linux.rocm.gpu.gfx942.4.b-bphpw-runner-rf5f6' 2025-12-04T11:12:20.8771167Z Runner group name: 'default' 2025-12-04T11:12:20.8771552Z Machine name: 'linux' 2025-12-04T11:12:20.8772664Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T11:12:20.8773735Z Contents: read 2025-12-04T11:12:20.8773989Z Metadata: read 2025-12-04T11:12:20.8774211Z ##[endgroup] 2025-12-04T11:12:20.8775241Z Secret source: Actions 2025-12-04T11:12:20.8775538Z Prepare workflow directory 2025-12-04T11:12:20.9008741Z Prepare all required actions 2025-12-04T11:12:20.9028096Z Getting action download info 2025-12-04T11:12:21.3019850Z Download action repository 'pytorch/pytorch@main' (SHA:c0cb6e78404416d418350632bfc554710a5f7281) 2025-12-04T11:12:24.6203798Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T11:12:25.7441318Z Download action repository 'actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T11:12:26.6223064Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T11:12:27.4112954Z Getting action download info 2025-12-04T11:12:27.6560575Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T11:12:28.4959582Z Getting action download info 2025-12-04T11:12:28.7070195Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T11:12:29.4643405Z Getting action download info 2025-12-04T11:12:29.6938580Z Uses: pytorch/pytorch/.github/workflows/_rocm-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T11:12:29.6940831Z ##[group] Inputs 2025-12-04T11:12:29.6941005Z build-environment: linux-noble-rocm-py3.12-mi300 2025-12-04T11:12:29.6942358Z test-matrix: {"include": [{"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T11:12:29.6943885Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:29.6944188Z sync-tag: 2025-12-04T11:12:29.6944717Z timeout-minutes: 300 2025-12-04T11:12:29.6944829Z tests-to-include: 2025-12-04T11:12:29.6944930Z dashboard-tag: 2025-12-04T11:12:29.6945164Z disable-monitor: true 2025-12-04T11:12:29.6945287Z monitor-log-interval: 5 2025-12-04T11:12:29.6945413Z monitor-data-collect-interval: 1 2025-12-04T11:12:29.6945545Z ##[endgroup] 2025-12-04T11:12:29.6945790Z Complete job name: linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:12:29.7215951Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T11:12:29.7216237Z with: 2025-12-04T11:12:29.7216341Z no-sudo: true 2025-12-04T11:12:29.7216596Z submodules: recursive 2025-12-04T11:12:29.7216705Z fetch-depth: 0 2025-12-04T11:12:29.7216852Z env: 2025-12-04T11:12:29.7216962Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:29.7217083Z ##[endgroup] 2025-12-04T11:12:29.7259798Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T11:12:29.7260171Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T11:12:29.7266873Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:29.7267029Z env: 2025-12-04T11:12:29.7267126Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:29.7267231Z ##[endgroup] 2025-12-04T11:12:29.7427326Z ##[group]Run actions/checkout@v4 2025-12-04T11:12:29.7427514Z with: 2025-12-04T11:12:29.7427643Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:29.7427793Z fetch-depth: 0 2025-12-04T11:12:29.7427892Z submodules: recursive 2025-12-04T11:12:29.7428117Z show-progress: false 2025-12-04T11:12:29.7428230Z repository: pytorch/pytorch 2025-12-04T11:12:29.7428401Z token: *** 2025-12-04T11:12:29.7428497Z ssh-strict: true 2025-12-04T11:12:29.7428590Z ssh-user: git 2025-12-04T11:12:29.7428694Z persist-credentials: true 2025-12-04T11:12:29.7428802Z clean: true 2025-12-04T11:12:29.7428898Z sparse-checkout-cone-mode: true 2025-12-04T11:12:29.7429015Z fetch-tags: false 2025-12-04T11:12:29.7429107Z lfs: false 2025-12-04T11:12:29.7429196Z set-safe-directory: true 2025-12-04T11:12:29.7429302Z env: 2025-12-04T11:12:29.7429388Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:29.7429496Z ##[endgroup] 2025-12-04T11:12:29.7983557Z Syncing repository: pytorch/pytorch 2025-12-04T11:12:29.7984134Z ##[group]Getting Git version info 2025-12-04T11:12:29.7984303Z Working directory is '/home/runner/_work/pytorch/pytorch' 2025-12-04T11:12:29.7984556Z [command]/usr/bin/git version 2025-12-04T11:12:29.7984674Z git version 2.52.0 2025-12-04T11:12:29.8002139Z ##[endgroup] 2025-12-04T11:12:29.8011581Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/45c2fec6-c385-4cec-b830-b317d24ae883/.gitconfig' 2025-12-04T11:12:29.8013103Z Temporarily overriding HOME='/home/runner/_work/_temp/45c2fec6-c385-4cec-b830-b317d24ae883' before making global git config changes 2025-12-04T11:12:29.8013603Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T11:12:29.8015934Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T11:12:29.8044453Z [command]/usr/bin/git config --local --get remote.origin.url 2025-12-04T11:12:29.8065371Z https://github.com/pytorch/pytorch 2025-12-04T11:12:29.8078278Z ##[group]Removing previously created refs, to avoid conflicts 2025-12-04T11:12:29.8081349Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-12-04T11:12:29.8102647Z refs/heads/main 2025-12-04T11:12:29.8107422Z [command]/usr/bin/git checkout --detach 2025-12-04T11:12:31.5985578Z HEAD is now at c0cb6e784044 [DTensor] ExplicitRedistributionContext warning mode (#169452) 2025-12-04T11:12:31.6035226Z [command]/usr/bin/git branch --delete --force main 2025-12-04T11:12:31.6187534Z Deleted branch main (was c0cb6e784044). 2025-12-04T11:12:31.6194207Z ##[endgroup] 2025-12-04T11:12:31.6203599Z [command]/usr/bin/git submodule status 2025-12-04T11:12:31.6447527Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-12-04T11:12:31.6496690Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-12-04T11:12:31.6551314Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-12-04T11:12:31.6597575Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-12-04T11:12:31.6639793Z 3ebbc93ded7285963bff932c678fa367eb393ba6 third_party/NVTX (v3.1.0-313-g3ebbc93) 2025-12-04T11:12:31.6698071Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-12-04T11:12:31.6983841Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-12-04T11:12:31.7010344Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-12-04T11:12:31.7025412Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-12-04T11:12:31.7090513Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-12-04T11:12:31.7170484Z 89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0) 2025-12-04T11:12:31.7244248Z f858c30bcb16f8effd5ff46996f0514539e17abc third_party/cpuinfo (f858c30) 2025-12-04T11:12:31.7267020Z 0b1577c8c83401237d601d0d0db5210506705396 third_party/cudnn_frontend (v0.5-61-g0b1577c) 2025-12-04T11:12:31.7326169Z f88806b1e31dfa579842638740216dd41fc6c588 third_party/cutlass (v4.3.1) 2025-12-04T11:12:31.7345157Z c0b988d39a9e47c794d699f29930ed4d7c7e13a4 third_party/fbgemm (v1.4.0-rc1-2-gc0b988d39) 2025-12-04T11:12:31.7392224Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-12-04T11:12:31.7410032Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-12-04T11:12:31.7658820Z 407c905e45ad75fc29bf0f9bb7c5c2fd3475976f third_party/fmt (12.1.0) 2025-12-04T11:12:31.7726994Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-12-04T11:12:31.7807120Z 54cbae0d3a67fa890b4c3d9ee162b7860315e341 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-37-g54cbae0) 2025-12-04T11:12:31.7937265Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-12-04T11:12:31.7983297Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-12-04T11:12:31.8017329Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-12-04T11:12:31.8147778Z 31f85df8fbd89c188f14ef10f1ec65379786b943 third_party/kineto (heads/main) 2025-12-04T11:12:31.8172993Z d7770c89632329a9914ef1a90289917597639cbe third_party/kleidiai (v1.15.0) 2025-12-04T11:12:31.8184095Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-12-04T11:12:31.8197253Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-12-04T11:12:31.8411461Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-12-04T11:12:31.8428077Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-12-04T11:12:31.8443139Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-12-04T11:12:31.8646175Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-12-04T11:12:31.8699218Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-12-04T11:12:31.8730388Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-12-04T11:12:31.8746335Z f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1) 2025-12-04T11:12:31.8790787Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-12-04T11:12:31.8834167Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-12-04T11:12:31.8879988Z 2b4cd91092d335a697416b2a3cb398283246849d third_party/tensorpipe (heads/main) 2025-12-04T11:12:31.8890019Z ##[group]Cleaning the repository 2025-12-04T11:12:31.8894117Z [command]/usr/bin/git clean -ffdx 2025-12-04T11:12:31.9009046Z [command]/usr/bin/git reset --hard HEAD 2025-12-04T11:12:31.9689022Z HEAD is now at c0cb6e784044 [DTensor] ExplicitRedistributionContext warning mode (#169452) 2025-12-04T11:12:31.9765404Z ##[endgroup] 2025-12-04T11:12:31.9767986Z ##[group]Disabling automatic garbage collection 2025-12-04T11:12:31.9771751Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T11:12:31.9802790Z ##[endgroup] 2025-12-04T11:12:31.9803005Z ##[group]Setting up auth 2025-12-04T11:12:31.9806604Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T11:12:31.9826339Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T11:12:31.9989916Z Entering 'android/libs/fbjni' 2025-12-04T11:12:32.0035141Z Entering 'third_party/FP16' 2025-12-04T11:12:32.0061829Z Entering 'third_party/FXdiv' 2025-12-04T11:12:32.0097295Z Entering 'third_party/NNPACK' 2025-12-04T11:12:32.0125148Z Entering 'third_party/NVTX' 2025-12-04T11:12:32.0149656Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:32.0173634Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:32.0203150Z Entering 'third_party/aiter' 2025-12-04T11:12:32.0226326Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:32.0260150Z Entering 'third_party/benchmark' 2025-12-04T11:12:32.0288004Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:32.0319851Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:32.0346254Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:32.0370334Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:32.0391535Z Entering 'third_party/cutlass' 2025-12-04T11:12:32.0416291Z Entering 'third_party/fbgemm' 2025-12-04T11:12:32.0443174Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:32.0472976Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:32.0504370Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:32.0533821Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:32.0574195Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:32.0599594Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:32.0623109Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:32.0647846Z Entering 'third_party/flash-attention' 2025-12-04T11:12:32.0669192Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:32.0693093Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:32.0730517Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:32.0760386Z Entering 'third_party/fmt' 2025-12-04T11:12:32.0785488Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:32.0807420Z Entering 'third_party/gloo' 2025-12-04T11:12:32.0829140Z Entering 'third_party/googletest' 2025-12-04T11:12:32.0861982Z Entering 'third_party/ideep' 2025-12-04T11:12:32.0884366Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:32.0920728Z Entering 'third_party/ittapi' 2025-12-04T11:12:32.0954339Z Entering 'third_party/kineto' 2025-12-04T11:12:32.0981849Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:32.1013647Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:32.1042994Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:32.1071852Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:32.1101501Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:32.1124406Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:32.1148956Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:32.1171849Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:32.1196060Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:32.1221937Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:32.1252524Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:32.1277413Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:32.1301472Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:32.1327450Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:32.1359639Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:32.1383822Z Entering 'third_party/kleidiai' 2025-12-04T11:12:32.1404888Z Entering 'third_party/mimalloc' 2025-12-04T11:12:32.1426385Z Entering 'third_party/nlohmann' 2025-12-04T11:12:32.1452865Z Entering 'third_party/onnx' 2025-12-04T11:12:32.1487312Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:32.1528196Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:32.1558384Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:32.1589203Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:32.1613617Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:32.1650870Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:32.1678594Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:32.1704971Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:32.1731728Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:32.1758061Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:32.1780821Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:32.1809172Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:32.1843384Z Entering 'third_party/pocketfft' 2025-12-04T11:12:32.1865098Z Entering 'third_party/protobuf' 2025-12-04T11:12:32.1889445Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:32.1911211Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:32.1932890Z Entering 'third_party/psimd' 2025-12-04T11:12:32.1954824Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:32.1978273Z Entering 'third_party/pybind11' 2025-12-04T11:12:32.1998042Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:32.2025899Z Entering 'third_party/sleef' 2025-12-04T11:12:32.2051108Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:32.2080212Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:32.2108585Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:32.2137289Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:32.2163485Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:32.2184225Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:32.2235931Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T11:12:32.2261830Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T11:12:32.2446645Z Entering 'android/libs/fbjni' 2025-12-04T11:12:32.2471605Z Entering 'third_party/FP16' 2025-12-04T11:12:32.2494291Z Entering 'third_party/FXdiv' 2025-12-04T11:12:32.2517594Z Entering 'third_party/NNPACK' 2025-12-04T11:12:32.2547367Z Entering 'third_party/NVTX' 2025-12-04T11:12:32.2571160Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:32.2597386Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:32.2626997Z Entering 'third_party/aiter' 2025-12-04T11:12:32.2649277Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:32.2681946Z Entering 'third_party/benchmark' 2025-12-04T11:12:32.2702834Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:32.2729808Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:32.2752193Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:32.2778216Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:32.2801018Z Entering 'third_party/cutlass' 2025-12-04T11:12:32.2830669Z Entering 'third_party/fbgemm' 2025-12-04T11:12:32.2858501Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:32.2877725Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:32.2899733Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:32.2925007Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:32.2949756Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:32.2968854Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:32.2990772Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:32.3012519Z Entering 'third_party/flash-attention' 2025-12-04T11:12:32.3034494Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:32.3057602Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:32.3085997Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:32.3109423Z Entering 'third_party/fmt' 2025-12-04T11:12:32.3132334Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:32.3153590Z Entering 'third_party/gloo' 2025-12-04T11:12:32.3176333Z Entering 'third_party/googletest' 2025-12-04T11:12:32.3198637Z Entering 'third_party/ideep' 2025-12-04T11:12:32.3224082Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:32.3255549Z Entering 'third_party/ittapi' 2025-12-04T11:12:32.3278143Z Entering 'third_party/kineto' 2025-12-04T11:12:32.3303956Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:32.3338220Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:32.3366577Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:32.3390601Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:32.3412371Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:32.3431287Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:32.3460871Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:32.3485562Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:32.3509766Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:32.3532831Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:32.3554429Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:32.3575065Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:32.3602259Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:32.3629359Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:32.3649598Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:32.3671616Z Entering 'third_party/kleidiai' 2025-12-04T11:12:32.3695375Z Entering 'third_party/mimalloc' 2025-12-04T11:12:32.3721251Z Entering 'third_party/nlohmann' 2025-12-04T11:12:32.3743378Z Entering 'third_party/onnx' 2025-12-04T11:12:32.3772322Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:32.3796625Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:32.3817597Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:32.3841944Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:32.3863156Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:32.3888318Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:32.3916169Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:32.3934736Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:32.3953024Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:32.3971877Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:32.4000395Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:32.4025673Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:32.4056950Z Entering 'third_party/pocketfft' 2025-12-04T11:12:32.4082525Z Entering 'third_party/protobuf' 2025-12-04T11:12:32.4104247Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:32.4126412Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:32.4152183Z Entering 'third_party/psimd' 2025-12-04T11:12:32.4175936Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:32.4197615Z Entering 'third_party/pybind11' 2025-12-04T11:12:32.4219334Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:32.4240128Z Entering 'third_party/sleef' 2025-12-04T11:12:32.4261659Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:32.4282708Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:32.4302232Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:32.4324446Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:32.4346392Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:32.4370034Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:32.4409461Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.4430670Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T11:12:32.4597580Z Entering 'android/libs/fbjni' 2025-12-04T11:12:32.4608052Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T11:12:32.4617081Z Entering 'third_party/FP16' 2025-12-04T11:12:32.4628712Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T11:12:32.4640207Z Entering 'third_party/FXdiv' 2025-12-04T11:12:32.4649941Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T11:12:32.4658444Z Entering 'third_party/NNPACK' 2025-12-04T11:12:32.4670083Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T11:12:32.4681287Z Entering 'third_party/NVTX' 2025-12-04T11:12:32.4692030Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T11:12:32.4700918Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:32.4709877Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T11:12:32.4717998Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:32.4727068Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T11:12:32.4741103Z Entering 'third_party/aiter' 2025-12-04T11:12:32.4750091Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T11:12:32.4760222Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:32.4776177Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T11:12:32.4794504Z Entering 'third_party/benchmark' 2025-12-04T11:12:32.4805886Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:32.4818002Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:32.4828447Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T11:12:32.4843846Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:32.4853318Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T11:12:32.4863077Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:32.4872680Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T11:12:32.4881746Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:32.4891017Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T11:12:32.4899513Z Entering 'third_party/cutlass' 2025-12-04T11:12:32.4910349Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T11:12:32.4923335Z Entering 'third_party/fbgemm' 2025-12-04T11:12:32.4935391Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T11:12:32.4948789Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:32.4958138Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T11:12:32.4969159Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:32.4978808Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T11:12:32.4989227Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:32.4998577Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T11:12:32.5006618Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:32.5015377Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T11:12:32.5026062Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:32.5034641Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T11:12:32.5042427Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:32.5053015Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T11:12:32.5060070Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:32.5073792Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T11:12:32.5083245Z Entering 'third_party/flash-attention' 2025-12-04T11:12:32.5092519Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T11:12:32.5101739Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:32.5119966Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T11:12:32.5132775Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:32.5147876Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T11:12:32.5160949Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:32.5174608Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T11:12:32.5184611Z Entering 'third_party/fmt' 2025-12-04T11:12:32.5194815Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:32.5203389Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:32.5216392Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T11:12:32.5226386Z Entering 'third_party/gloo' 2025-12-04T11:12:32.5235967Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T11:12:32.5244833Z Entering 'third_party/googletest' 2025-12-04T11:12:32.5253734Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:32.5262691Z Entering 'third_party/ideep' 2025-12-04T11:12:32.5275205Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T11:12:32.5283728Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:32.5299475Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T11:12:32.5314039Z Entering 'third_party/ittapi' 2025-12-04T11:12:32.5323527Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T11:12:32.5332078Z Entering 'third_party/kineto' 2025-12-04T11:12:32.5341738Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T11:12:32.5349878Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:32.5360771Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T11:12:32.5372456Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:32.5383512Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T11:12:32.5397279Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:32.5413450Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T11:12:32.5430036Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:32.5447372Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:32.5455876Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:32.5478944Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T11:12:32.5488639Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:32.5504610Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T11:12:32.5516627Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:32.5529548Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T11:12:32.5541653Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:32.5554215Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:32.5566818Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:32.5582554Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T11:12:32.5592374Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:32.5606123Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T11:12:32.5615232Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:32.5630164Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:32.5642058Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:32.5660278Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:32.5666662Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:32.5678305Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:32.5694474Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:32.5705660Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T11:12:32.5715125Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:32.5728176Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T11:12:32.5738014Z Entering 'third_party/kleidiai' 2025-12-04T11:12:32.5749667Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T11:12:32.5758200Z Entering 'third_party/mimalloc' 2025-12-04T11:12:32.5770177Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T11:12:32.5779100Z Entering 'third_party/nlohmann' 2025-12-04T11:12:32.5791180Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T11:12:32.5804697Z Entering 'third_party/onnx' 2025-12-04T11:12:32.5815874Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T11:12:32.5832828Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:32.5852771Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:32.5866563Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:32.5877526Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T11:12:32.5887617Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:32.5903437Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:32.5916055Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:32.5926147Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:32.5935024Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:32.5943955Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T11:12:32.5960057Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:32.5969425Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T11:12:32.5978154Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:32.5988637Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T11:12:32.5999673Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:32.6010885Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T11:12:32.6019670Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:32.6030092Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:32.6041759Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:32.6052102Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:32.6062181Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:32.6072573Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:32.6082146Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:32.6091330Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T11:12:32.6108267Z Entering 'third_party/pocketfft' 2025-12-04T11:12:32.6117771Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T11:12:32.6126511Z Entering 'third_party/protobuf' 2025-12-04T11:12:32.6137426Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T11:12:32.6147194Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:32.6166297Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:32.6175925Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:32.6193944Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:32.6205371Z Entering 'third_party/psimd' 2025-12-04T11:12:32.6216046Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T11:12:32.6224775Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:32.6234386Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T11:12:32.6247602Z Entering 'third_party/pybind11' 2025-12-04T11:12:32.6259039Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:32.6267790Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:32.6277422Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T11:12:32.6292435Z Entering 'third_party/sleef' 2025-12-04T11:12:32.6302049Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T11:12:32.6311789Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:32.6321325Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T11:12:32.6334795Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:32.6344455Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:32.6353325Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:32.6368869Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T11:12:32.6377814Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:32.6391087Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T11:12:32.6400251Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:32.6410221Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:32.6418282Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:32.6428028Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T11:12:32.6453965Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6473102Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6512716Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6513378Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6522526Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6537393Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6551382Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6566030Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6583670Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6598471Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6613628Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6628441Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6652750Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6668778Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6687960Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6709721Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6724957Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6741096Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6757578Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6772720Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6792599Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6809645Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6833435Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6862628Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6893502Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6913249Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6933733Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6958719Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6981318Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.6997631Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7020056Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7036735Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7053934Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7074515Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7099560Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7123198Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7141128Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7161501Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7178106Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7194790Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7215016Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7232261Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7249475Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7266058Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7282351Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7297543Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7318377Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7335030Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7350252Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7367040Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7382283Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7400784Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7417303Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7433552Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7451396Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7468483Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7488633Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7510900Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7528110Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7543873Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7565311Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7582284Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7603945Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7620607Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7643260Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7659776Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7677584Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7695048Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7712453Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7733239Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7749325Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7765755Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7783161Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7803997Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7822026Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7839329Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7856372Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7877509Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7894301Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7912443Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7931435Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:32.7954840Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T11:12:32.7981919Z ##[endgroup] 2025-12-04T11:12:32.7982103Z ##[group]Fetching the repository 2025-12-04T11:12:32.7985770Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T11:12:36.1467760Z From https://github.com/pytorch/pytorch 2025-12-04T11:12:36.1468423Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T11:12:36.1468915Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T11:12:36.1469392Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T11:12:36.1469985Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T11:12:36.1470483Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T11:12:36.1470947Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T11:12:36.1471384Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T11:12:36.1471833Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T11:12:36.1472289Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T11:12:36.1472786Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T11:12:36.1473260Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T11:12:36.1473714Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T11:12:36.1474147Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T11:12:36.1474602Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T11:12:36.1475040Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T11:12:36.1475426Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T11:12:36.1475854Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T11:12:36.1476274Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T11:12:36.1476672Z * [new branch] adi/test -> origin/adi/test 2025-12-04T11:12:36.1477053Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T11:12:36.1477448Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T11:12:36.1477830Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T11:12:36.1478247Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T11:12:36.1478693Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T11:12:36.1479109Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T11:12:36.1480203Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T11:12:36.1480528Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T11:12:36.1480987Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T11:12:36.1481358Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T11:12:36.1481661Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T11:12:36.1481967Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T11:12:36.1482328Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T11:12:36.1482671Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T11:12:36.1483032Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T11:12:36.1483409Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T11:12:36.1483733Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T11:12:36.1484036Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T11:12:36.1484329Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T11:12:36.1484625Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T11:12:36.1484921Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T11:12:36.1485219Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T11:12:36.1485527Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T11:12:36.1485839Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T11:12:36.1486130Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T11:12:36.1486428Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T11:12:36.1486729Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T11:12:36.1487039Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T11:12:36.1487348Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T11:12:36.1487638Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T11:12:36.1487936Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T11:12:36.1488246Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T11:12:36.1488538Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T11:12:36.1488833Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T11:12:36.1489164Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T11:12:36.1489488Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T11:12:36.1489816Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T11:12:36.1490077Z * [new branch] async_tp -> origin/async_tp 2025-12-04T11:12:36.1490321Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T11:12:36.1490621Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T11:12:36.1490947Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T11:12:36.1491169Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T11:12:36.1491513Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T11:12:36.1491733Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T11:12:36.1491951Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T11:12:36.1492174Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T11:12:36.1492394Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T11:12:36.1492619Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T11:12:36.1492860Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T11:12:36.1493189Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T11:12:36.1493461Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T11:12:36.1493729Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T11:12:36.1493974Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T11:12:36.1494203Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T11:12:36.1494420Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T11:12:36.1494635Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T11:12:36.1494877Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T11:12:36.1495145Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T11:12:36.1495377Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T11:12:36.1495625Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T11:12:36.1495883Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T11:12:36.1496108Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T11:12:36.1496340Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T11:12:36.1496557Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T11:12:36.1496770Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T11:12:36.1497003Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T11:12:36.1497246Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T11:12:36.1497483Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T11:12:36.1497713Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T11:12:36.1497986Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T11:12:36.1498419Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T11:12:36.1498799Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T11:12:36.1499054Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T11:12:36.1499307Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T11:12:36.1499535Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T11:12:36.1499757Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T11:12:36.1500021Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T11:12:36.1500261Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T11:12:36.1500472Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T11:12:36.1500697Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T11:12:36.1500924Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T11:12:36.1501123Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T11:12:36.1501336Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T11:12:36.1501561Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T11:12:36.1501785Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T11:12:36.1502006Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T11:12:36.1502218Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T11:12:36.1502430Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T11:12:36.1502643Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T11:12:36.1502861Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T11:12:36.1503079Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T11:12:36.1503291Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T11:12:36.1503506Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T11:12:36.1503721Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T11:12:36.1503935Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T11:12:36.1504149Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T11:12:36.1504379Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T11:12:36.1504594Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T11:12:36.1504807Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T11:12:36.1505021Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T11:12:36.1505242Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T11:12:36.1505496Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T11:12:36.1505724Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T11:12:36.1505902Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T11:12:36.1506079Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T11:12:36.1506262Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T11:12:36.1506474Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T11:12:36.1506744Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T11:12:36.1507002Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1507317Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1507594Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1507870Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1508150Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1508420Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1508698Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1508981Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1509254Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1509530Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1509850Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1510125Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1510403Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1510681Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1510953Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1511239Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1511516Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1511792Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1512095Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T11:12:36.1512343Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T11:12:36.1512551Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T11:12:36.1512741Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T11:12:36.1512933Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T11:12:36.1513125Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T11:12:36.1513309Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T11:12:36.1513487Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T11:12:36.1513662Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T11:12:36.1513930Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T11:12:36.1514283Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T11:12:36.1514603Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T11:12:36.1515005Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T11:12:36.1515287Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T11:12:36.1515481Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T11:12:36.1515662Z * [new branch] context_test -> origin/context_test 2025-12-04T11:12:36.1515901Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T11:12:36.1516157Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T11:12:36.1516381Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T11:12:36.1516639Z * [new branch] crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering 2025-12-04T11:12:36.1516879Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T11:12:36.1517089Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T11:12:36.1517312Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T11:12:36.1517508Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T11:12:36.1517707Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T11:12:36.1517906Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T11:12:36.1518089Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T11:12:36.1518279Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T11:12:36.1518466Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T11:12:36.1518660Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T11:12:36.1518864Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T11:12:36.1519060Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T11:12:36.1519251Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T11:12:36.1519446Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T11:12:36.1519648Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T11:12:36.1519939Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T11:12:36.1520176Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T11:12:36.1520411Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T11:12:36.1520614Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T11:12:36.1520814Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T11:12:36.1521000Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T11:12:36.1521203Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T11:12:36.1521405Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T11:12:36.1521668Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T11:12:36.1521922Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T11:12:36.1522223Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T11:12:36.1522456Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T11:12:36.1522654Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T11:12:36.1522837Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T11:12:36.1523024Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T11:12:36.1523199Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T11:12:36.1523391Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T11:12:36.1523602Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T11:12:36.1523796Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T11:12:36.1523982Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T11:12:36.1524167Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T11:12:36.1524502Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T11:12:36.1524958Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T11:12:36.1525302Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T11:12:36.1525546Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T11:12:36.1525788Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T11:12:36.1526009Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T11:12:36.1526218Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T11:12:36.1526410Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T11:12:36.1526602Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T11:12:36.1526818Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T11:12:36.1527042Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T11:12:36.1527276Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T11:12:36.1527496Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T11:12:36.1527699Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T11:12:36.1527889Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T11:12:36.1528087Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T11:12:36.1528301Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T11:12:36.1528508Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T11:12:36.1528697Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T11:12:36.1528887Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T11:12:36.1529095Z * [new branch] docs -> origin/docs 2025-12-04T11:12:36.1529277Z * [new branch] documentation -> origin/documentation 2025-12-04T11:12:36.1529503Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T11:12:36.1529776Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T11:12:36.1530006Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T11:12:36.1530229Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T11:12:36.1530430Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T11:12:36.1530608Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T11:12:36.1530788Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T11:12:36.1530958Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T11:12:36.1531130Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T11:12:36.1531308Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T11:12:36.1531500Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T11:12:36.1531744Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T11:12:36.1532004Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T11:12:36.1532258Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T11:12:36.1532547Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T11:12:36.1532844Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T11:12:36.1533152Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T11:12:36.1533426Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T11:12:36.1533667Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T11:12:36.1533913Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T11:12:36.1534147Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T11:12:36.1534426Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T11:12:36.1534697Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T11:12:36.1534926Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T11:12:36.1535199Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T11:12:36.1535475Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T11:12:36.1535739Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T11:12:36.1536007Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T11:12:36.1536275Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T11:12:36.1536623Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T11:12:36.1536856Z * [new branch] exec -> origin/exec 2025-12-04T11:12:36.1537040Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T11:12:36.1537275Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T11:12:36.1537547Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T11:12:36.1537736Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T11:12:36.1537924Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T11:12:36.1538109Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T11:12:36.1538291Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T11:12:36.1538484Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T11:12:36.1538670Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T11:12:36.1538855Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T11:12:36.1539043Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T11:12:36.1539224Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T11:12:36.1539410Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T11:12:36.1539594Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T11:12:36.1539847Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T11:12:36.1540031Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T11:12:36.1540217Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T11:12:36.1540397Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T11:12:36.1540581Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T11:12:36.1540774Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T11:12:36.1540956Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T11:12:36.1541144Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T11:12:36.1541334Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T11:12:36.1541516Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T11:12:36.1541706Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T11:12:36.1541893Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T11:12:36.1542074Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T11:12:36.1542258Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T11:12:36.1542451Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T11:12:36.1542632Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T11:12:36.1542818Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T11:12:36.1543043Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T11:12:36.1543282Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T11:12:36.1543499Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T11:12:36.1543736Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T11:12:36.1543943Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T11:12:36.1544182Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T11:12:36.1544380Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T11:12:36.1544590Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T11:12:36.1544775Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T11:12:36.1544943Z * [new branch] fca -> origin/fca 2025-12-04T11:12:36.1545112Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T11:12:36.1545282Z * [new branch] fca5 -> origin/fca5 2025-12-04T11:12:36.1545475Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T11:12:36.1545693Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T11:12:36.1545903Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T11:12:36.1546092Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T11:12:36.1546292Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T11:12:36.1546491Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T11:12:36.1546691Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T11:12:36.1546896Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T11:12:36.1547102Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T11:12:36.1547313Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T11:12:36.1547528Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T11:12:36.1547737Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T11:12:36.1547960Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T11:12:36.1548180Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T11:12:36.1548367Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T11:12:36.1548553Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T11:12:36.1548757Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T11:12:36.1548961Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T11:12:36.1549161Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T11:12:36.1549360Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T11:12:36.1549547Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T11:12:36.1549767Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T11:12:36.1549953Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T11:12:36.1550139Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T11:12:36.1550331Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T11:12:36.1550519Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T11:12:36.1550768Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T11:12:36.1550976Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T11:12:36.1551218Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T11:12:36.1551468Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T11:12:36.1551696Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T11:12:36.1551894Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T11:12:36.1552075Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T11:12:36.1552253Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T11:12:36.1552438Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T11:12:36.1552677Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T11:12:36.1552942Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T11:12:36.1553167Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T11:12:36.1553355Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T11:12:36.1553553Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T11:12:36.1553754Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T11:12:36.1553955Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T11:12:36.1554152Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T11:12:36.1554347Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T11:12:36.1554540Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T11:12:36.1554732Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T11:12:36.1554921Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T11:12:36.1555109Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T11:12:36.1555296Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T11:12:36.1555482Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T11:12:36.1555675Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T11:12:36.1555864Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T11:12:36.1556055Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T11:12:36.1556242Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T11:12:36.1556431Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T11:12:36.1556616Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T11:12:36.1556807Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T11:12:36.1556995Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T11:12:36.1557180Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T11:12:36.1557367Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T11:12:36.1557583Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T11:12:36.1557787Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T11:12:36.1558003Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T11:12:36.1558252Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T11:12:36.1558463Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T11:12:36.1558676Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T11:12:36.1558886Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T11:12:36.1559093Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T11:12:36.1559310Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T11:12:36.1559526Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T11:12:36.1559810Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T11:12:36.1560027Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T11:12:36.1560238Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T11:12:36.1560450Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T11:12:36.1560663Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T11:12:36.1560875Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T11:12:36.1561087Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T11:12:36.1561302Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T11:12:36.1561517Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T11:12:36.1561730Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T11:12:36.1561945Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T11:12:36.1562154Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T11:12:36.1562366Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T11:12:36.1562577Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T11:12:36.1562789Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T11:12:36.1563006Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T11:12:36.1563220Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T11:12:36.1563429Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T11:12:36.1563644Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T11:12:36.1563854Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T11:12:36.1564069Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T11:12:36.1564281Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T11:12:36.1564493Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T11:12:36.1564702Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T11:12:36.1564962Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T11:12:36.1565174Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T11:12:36.1565548Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T11:12:36.1565766Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T11:12:36.1565980Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T11:12:36.1566191Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T11:12:36.1566405Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T11:12:36.1566621Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T11:12:36.1566837Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T11:12:36.1567049Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T11:12:36.1567267Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T11:12:36.1567476Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T11:12:36.1567694Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T11:12:36.1567905Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T11:12:36.1568114Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T11:12:36.1568327Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T11:12:36.1568543Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T11:12:36.1568756Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T11:12:36.1568969Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T11:12:36.1569181Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T11:12:36.1569395Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T11:12:36.1569612Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T11:12:36.1569868Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T11:12:36.1570083Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T11:12:36.1570298Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T11:12:36.1570507Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T11:12:36.1570725Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T11:12:36.1570943Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T11:12:36.1571150Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T11:12:36.1571368Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T11:12:36.1571582Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T11:12:36.1571790Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T11:12:36.1572003Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T11:12:36.1572266Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T11:12:36.1572475Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T11:12:36.1572721Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T11:12:36.1572932Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T11:12:36.1573142Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T11:12:36.1573358Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T11:12:36.1573580Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T11:12:36.1573787Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T11:12:36.1573999Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T11:12:36.1574213Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T11:12:36.1574417Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T11:12:36.1574628Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T11:12:36.1574836Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T11:12:36.1575038Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T11:12:36.1575249Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T11:12:36.1575448Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T11:12:36.1575635Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T11:12:36.1575826Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T11:12:36.1576009Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T11:12:36.1576200Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T11:12:36.1576391Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T11:12:36.1576572Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T11:12:36.1576756Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T11:12:36.1576942Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T11:12:36.1577122Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T11:12:36.1577310Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T11:12:36.1577497Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T11:12:36.1577677Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T11:12:36.1577868Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T11:12:36.1578054Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T11:12:36.1578240Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T11:12:36.1578425Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T11:12:36.1578612Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T11:12:36.1578794Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T11:12:36.1578982Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T11:12:36.1579204Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T11:12:36.1579393Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T11:12:36.1579612Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T11:12:36.1579851Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T11:12:36.1580034Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T11:12:36.1580219Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T11:12:36.1580406Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T11:12:36.1580591Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T11:12:36.1580777Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T11:12:36.1580960Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T11:12:36.1581147Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T11:12:36.1581338Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T11:12:36.1581522Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T11:12:36.1581710Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T11:12:36.1581895Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T11:12:36.1582079Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T11:12:36.1582266Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T11:12:36.1582469Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T11:12:36.1582678Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T11:12:36.1582882Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T11:12:36.1583089Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T11:12:36.1583290Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T11:12:36.1583499Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T11:12:36.1583704Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T11:12:36.1583906Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T11:12:36.1584110Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T11:12:36.1584318Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T11:12:36.1584522Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T11:12:36.1584733Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T11:12:36.1584936Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T11:12:36.1585136Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T11:12:36.1585339Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T11:12:36.1585547Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T11:12:36.1585747Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T11:12:36.1585996Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T11:12:36.1586195Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T11:12:36.1586405Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T11:12:36.1586650Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T11:12:36.1586850Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T11:12:36.1587061Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T11:12:36.1587261Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T11:12:36.1587459Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T11:12:36.1587666Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T11:12:36.1587873Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T11:12:36.1588072Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T11:12:36.1588273Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T11:12:36.1588477Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T11:12:36.1588672Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T11:12:36.1588875Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T11:12:36.1589077Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T11:12:36.1589422Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T11:12:36.1589631Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T11:12:36.1589880Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T11:12:36.1590084Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T11:12:36.1590298Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T11:12:36.1590513Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T11:12:36.1590728Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T11:12:36.1590936Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T11:12:36.1591191Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T11:12:36.1591613Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T11:12:36.1591882Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T11:12:36.1592130Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T11:12:36.1592394Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T11:12:36.1592654Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T11:12:36.1592902Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T11:12:36.1593157Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T11:12:36.1593413Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T11:12:36.1593662Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T11:12:36.1593972Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T11:12:36.1594226Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T11:12:36.1594508Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T11:12:36.1594764Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T11:12:36.1595018Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T11:12:36.1595270Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T11:12:36.1595524Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T11:12:36.1595774Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T11:12:36.1596030Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T11:12:36.1596286Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T11:12:36.1596539Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T11:12:36.1596803Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T11:12:36.1622008Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T11:12:36.1622294Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T11:12:36.1622520Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T11:12:36.1622769Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T11:12:36.1623006Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T11:12:36.1623276Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T11:12:36.1623516Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T11:12:36.1623757Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T11:12:36.1623981Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T11:12:36.1624210Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T11:12:36.1624452Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T11:12:36.1624680Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T11:12:36.1624893Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T11:12:36.1625109Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T11:12:36.1625327Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T11:12:36.1625531Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T11:12:36.1625734Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T11:12:36.1625935Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T11:12:36.1626131Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T11:12:36.1626330Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T11:12:36.1626530Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T11:12:36.1626729Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T11:12:36.1627037Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T11:12:36.1627238Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T11:12:36.1627503Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T11:12:36.1627751Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T11:12:36.1627945Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T11:12:36.1628135Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T11:12:36.1628329Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T11:12:36.1628521Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T11:12:36.1628722Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T11:12:36.1628931Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T11:12:36.1629132Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T11:12:36.1629342Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T11:12:36.1629543Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T11:12:36.1629827Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T11:12:36.1630025Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T11:12:36.1630215Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T11:12:36.1630404Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T11:12:36.1630606Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T11:12:36.1630799Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T11:12:36.1630995Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T11:12:36.1631196Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T11:12:36.1631390Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T11:12:36.1631578Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T11:12:36.1631769Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T11:12:36.1631959Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T11:12:36.1632147Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T11:12:36.1632341Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T11:12:36.1632531Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T11:12:36.1632728Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T11:12:36.1632928Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T11:12:36.1633119Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T11:12:36.1633317Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T11:12:36.1633513Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T11:12:36.1633712Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T11:12:36.1633959Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T11:12:36.1634154Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T11:12:36.1634345Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T11:12:36.1634586Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T11:12:36.1634783Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T11:12:36.1634975Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T11:12:36.1635168Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T11:12:36.1635363Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T11:12:36.1635556Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T11:12:36.1635760Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T11:12:36.1635954Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T11:12:36.1636150Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T11:12:36.1636371Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T11:12:36.1636570Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T11:12:36.1636766Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T11:12:36.1636963Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T11:12:36.1637160Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T11:12:36.1637355Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T11:12:36.1637551Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T11:12:36.1637748Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T11:12:36.1637950Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T11:12:36.1638154Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T11:12:36.1638348Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T11:12:36.1638544Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T11:12:36.1638738Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T11:12:36.1638935Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T11:12:36.1639134Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T11:12:36.1639330Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T11:12:36.1639526Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T11:12:36.1639762Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T11:12:36.1639956Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T11:12:36.1640150Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T11:12:36.1640345Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T11:12:36.1640541Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T11:12:36.1640784Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T11:12:36.1640979Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T11:12:36.1641174Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T11:12:36.1641398Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T11:12:36.1641592Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T11:12:36.1641788Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T11:12:36.1641980Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T11:12:36.1642175Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T11:12:36.1642371Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T11:12:36.1642567Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T11:12:36.1642766Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T11:12:36.1642961Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T11:12:36.1643153Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T11:12:36.1643348Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T11:12:36.1643541Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T11:12:36.1643736Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T11:12:36.1643934Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T11:12:36.1644131Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T11:12:36.1644323Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T11:12:36.1644516Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T11:12:36.1644716Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T11:12:36.1644908Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T11:12:36.1645102Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T11:12:36.1645295Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T11:12:36.1645495Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T11:12:36.1645701Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T11:12:36.1645904Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T11:12:36.1646104Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T11:12:36.1646304Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T11:12:36.1646498Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T11:12:36.1646695Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T11:12:36.1646896Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T11:12:36.1647094Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T11:12:36.1647290Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T11:12:36.1647513Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T11:12:36.1647710Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T11:12:36.1647909Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T11:12:36.1648145Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T11:12:36.1648342Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T11:12:36.1648541Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T11:12:36.1648736Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T11:12:36.1648932Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T11:12:36.1649130Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T11:12:36.1649328Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T11:12:36.1649516Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T11:12:36.1649776Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T11:12:36.1649970Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T11:12:36.1650162Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T11:12:36.1650359Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T11:12:36.1650554Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T11:12:36.1650741Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T11:12:36.1650927Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T11:12:36.1651110Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T11:12:36.1651383Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T11:12:36.1651664Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T11:12:36.1651871Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T11:12:36.1652081Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T11:12:36.1652293Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T11:12:36.1652498Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T11:12:36.1652708Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T11:12:36.1652917Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T11:12:36.1653122Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T11:12:36.1653336Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T11:12:36.1653535Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T11:12:36.1653721Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T11:12:36.1653909Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T11:12:36.1654106Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T11:12:36.1654302Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T11:12:36.1654541Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T11:12:36.1654736Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T11:12:36.1654968Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T11:12:36.1655160Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T11:12:36.1655353Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T11:12:36.1655544Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T11:12:36.1655743Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T11:12:36.1655941Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T11:12:36.1656134Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T11:12:36.1656336Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T11:12:36.1656535Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T11:12:36.1656732Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T11:12:36.1656925Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T11:12:36.1657121Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T11:12:36.1657320Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T11:12:36.1657514Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T11:12:36.1657706Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T11:12:36.1657898Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T11:12:36.1658094Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T11:12:36.1658285Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T11:12:36.1658476Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T11:12:36.1658665Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T11:12:36.1658851Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T11:12:36.1659043Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T11:12:36.1659233Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T11:12:36.1659419Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T11:12:36.1659609Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T11:12:36.1659833Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T11:12:36.1660027Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T11:12:36.1660215Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T11:12:36.1660405Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T11:12:36.1660594Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T11:12:36.1660786Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T11:12:36.1660980Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T11:12:36.1661212Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T11:12:36.1661404Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T11:12:36.1661593Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T11:12:36.1661810Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T11:12:36.1662006Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T11:12:36.1662196Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T11:12:36.1662383Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T11:12:36.1662571Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T11:12:36.1662762Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T11:12:36.1662954Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T11:12:36.1663147Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T11:12:36.1663340Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T11:12:36.1663527Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T11:12:36.1663716Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T11:12:36.1663902Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T11:12:36.1664094Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T11:12:36.1664284Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T11:12:36.1664476Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T11:12:36.1664666Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T11:12:36.1664855Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T11:12:36.1665045Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T11:12:36.1665235Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T11:12:36.1665426Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T11:12:36.1665613Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T11:12:36.1665801Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T11:12:36.1665988Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T11:12:36.1666177Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T11:12:36.1666369Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T11:12:36.1666562Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T11:12:36.1666751Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T11:12:36.1666943Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T11:12:36.1667132Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T11:12:36.1667321Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T11:12:36.1667511Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T11:12:36.1667702Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T11:12:36.1667920Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T11:12:36.1668114Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T11:12:36.1668389Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T11:12:36.1668575Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T11:12:36.1668765Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T11:12:36.1668957Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T11:12:36.1669142Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T11:12:36.1669345Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T11:12:36.1669550Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T11:12:36.1669781Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T11:12:36.1669986Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T11:12:36.1670187Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T11:12:36.1670385Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T11:12:36.1670583Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T11:12:36.1670779Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T11:12:36.1670975Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T11:12:36.1671177Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T11:12:36.1671376Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T11:12:36.1671574Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T11:12:36.1671777Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T11:12:36.1671976Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T11:12:36.1672174Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T11:12:36.1672372Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T11:12:36.1672566Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T11:12:36.1672771Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T11:12:36.1672969Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T11:12:36.1673166Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T11:12:36.1673362Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T11:12:36.1673561Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T11:12:36.1673754Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T11:12:36.1673952Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T11:12:36.1674150Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T11:12:36.1674345Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T11:12:36.1674581Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T11:12:36.1674779Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T11:12:36.1674974Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T11:12:36.1675215Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T11:12:36.1675411Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T11:12:36.1675610Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T11:12:36.1675806Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T11:12:36.1676001Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T11:12:36.1676196Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T11:12:36.1676398Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T11:12:36.1676599Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T11:12:36.1676801Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T11:12:36.1676996Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T11:12:36.1677190Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T11:12:36.1677388Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T11:12:36.1677589Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T11:12:36.1677785Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T11:12:36.1677986Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T11:12:36.1678182Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T11:12:36.1678383Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T11:12:36.1678588Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T11:12:36.1678785Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T11:12:36.1678983Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T11:12:36.1679182Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T11:12:36.1679383Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T11:12:36.1679581Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T11:12:36.1679848Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T11:12:36.1680045Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T11:12:36.1680247Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T11:12:36.1680446Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T11:12:36.1680643Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T11:12:36.1680840Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T11:12:36.1681042Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T11:12:36.1681243Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T11:12:36.1681478Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T11:12:36.1681677Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T11:12:36.1681902Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T11:12:36.1682096Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T11:12:36.1682292Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T11:12:36.1682488Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T11:12:36.1682683Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T11:12:36.1682883Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T11:12:36.1683080Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T11:12:36.1683277Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T11:12:36.1683474Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T11:12:36.1683673Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T11:12:36.1683870Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T11:12:36.1684070Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T11:12:36.1684263Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T11:12:36.1684461Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T11:12:36.1684663Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T11:12:36.1684862Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T11:12:36.1685060Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T11:12:36.1685265Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T11:12:36.1685459Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T11:12:36.1685654Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T11:12:36.1685857Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T11:12:36.1686051Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T11:12:36.1686248Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T11:12:36.1686448Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T11:12:36.1686643Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T11:12:36.1686837Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T11:12:36.1687040Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T11:12:36.1687236Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T11:12:36.1687433Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T11:12:36.1687628Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T11:12:36.1687822Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T11:12:36.1688022Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T11:12:36.1688248Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T11:12:36.1688444Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T11:12:36.1688667Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T11:12:36.1688864Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T11:12:36.1689060Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T11:12:36.1689264Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T11:12:36.1689459Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T11:12:36.1689657Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T11:12:36.1689893Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T11:12:36.1690090Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T11:12:36.1690290Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T11:12:36.1690491Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T11:12:36.1690686Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T11:12:36.1690885Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T11:12:36.1691082Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T11:12:36.1691278Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T11:12:36.1691482Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T11:12:36.1691681Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T11:12:36.1691879Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T11:12:36.1692079Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T11:12:36.1692278Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T11:12:36.1692477Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T11:12:36.1692679Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T11:12:36.1692873Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T11:12:36.1693069Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T11:12:36.1693269Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T11:12:36.1693463Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T11:12:36.1693664Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T11:12:36.1693862Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T11:12:36.1694056Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T11:12:36.1694255Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T11:12:36.1694452Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T11:12:36.1694650Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T11:12:36.1694891Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T11:12:36.1695088Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T11:12:36.1695283Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T11:12:36.1695519Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T11:12:36.1695715Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T11:12:36.1695909Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T11:12:36.1696101Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T11:12:36.1696290Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T11:12:36.1696477Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T11:12:36.1696671Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T11:12:36.1696859Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T11:12:36.1697045Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T11:12:36.1697233Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T11:12:36.1697422Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T11:12:36.1697607Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T11:12:36.1697800Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T11:12:36.1697992Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T11:12:36.1698184Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T11:12:36.1698373Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T11:12:36.1698562Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T11:12:36.1698749Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T11:12:36.1698939Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T11:12:36.1699129Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T11:12:36.1699315Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T11:12:36.1699505Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T11:12:36.1699739Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T11:12:36.1699935Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T11:12:36.1700125Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T11:12:36.1700311Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T11:12:36.1700499Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T11:12:36.1700691Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T11:12:36.1700878Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T11:12:36.1701068Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T11:12:36.1701258Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T11:12:36.1701446Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T11:12:36.1701697Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T11:12:36.1701887Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T11:12:36.1702327Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T11:12:36.1702515Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T11:12:36.1702708Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T11:12:36.1702896Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T11:12:36.1703085Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T11:12:36.1703275Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T11:12:36.1703468Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T11:12:36.1703662Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T11:12:36.1703855Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T11:12:36.1704047Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T11:12:36.1704238Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T11:12:36.1704430Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T11:12:36.1704619Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T11:12:36.1704812Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T11:12:36.1705001Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T11:12:36.1705197Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T11:12:36.1705389Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T11:12:36.1705581Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T11:12:36.1705773Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T11:12:36.1705965Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T11:12:36.1706152Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T11:12:36.1706343Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T11:12:36.1706535Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T11:12:36.1706725Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T11:12:36.1706920Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T11:12:36.1707111Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T11:12:36.1707301Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T11:12:36.1707495Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T11:12:36.1707691Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T11:12:36.1707880Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T11:12:36.1708072Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T11:12:36.1708264Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T11:12:36.1708482Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T11:12:36.1708678Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T11:12:36.1708900Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T11:12:36.1709090Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T11:12:36.1709285Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T11:12:36.1709475Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T11:12:36.1709667Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T11:12:36.1709899Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T11:12:36.1710090Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T11:12:36.1710282Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T11:12:36.1710478Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T11:12:36.1710671Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T11:12:36.1710866Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T11:12:36.1711059Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T11:12:36.1711248Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T11:12:36.1711440Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T11:12:36.1711634Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T11:12:36.1711829Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T11:12:36.1712024Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T11:12:36.1712217Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T11:12:36.1712408Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T11:12:36.1712599Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T11:12:36.1712794Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T11:12:36.1712983Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T11:12:36.1713174Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T11:12:36.1713365Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T11:12:36.1713557Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T11:12:36.1713752Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T11:12:36.1713948Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T11:12:36.1714150Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T11:12:36.1714360Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T11:12:36.1714572Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T11:12:36.1714776Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T11:12:36.1714982Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T11:12:36.1715224Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T11:12:36.1715415Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T11:12:36.1715605Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T11:12:36.1715834Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T11:12:36.1716025Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T11:12:36.1716214Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T11:12:36.1716400Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T11:12:36.1716590Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T11:12:36.1716779Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T11:12:36.1716969Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T11:12:36.1717158Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T11:12:36.1717350Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T11:12:36.1717538Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T11:12:36.1717726Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T11:12:36.1717917Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T11:12:36.1718103Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T11:12:36.1718291Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T11:12:36.1718479Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T11:12:36.1718666Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T11:12:36.1718855Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T11:12:36.1719054Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T11:12:36.1719131Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T11:12:36.1719209Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T11:12:36.1719284Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T11:12:36.1719357Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T11:12:36.1719431Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T11:12:36.1719509Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T11:12:36.1719583Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T11:12:36.1719657Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T11:12:36.1719781Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T11:12:36.1719858Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T11:12:36.1719932Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T11:12:36.1720007Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T11:12:36.1720082Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T11:12:36.1720156Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T11:12:36.1720274Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T11:12:36.1720370Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T11:12:36.1720504Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T11:12:36.1720593Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T11:12:36.1720680Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T11:12:36.1720772Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T11:12:36.1720858Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T11:12:36.1720945Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T11:12:36.1721038Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T11:12:36.1721123Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T11:12:36.1721212Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T11:12:36.1721304Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T11:12:36.1721390Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T11:12:36.1721476Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T11:12:36.1721565Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T11:12:36.1721650Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T11:12:36.1721737Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T11:12:36.1721828Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T11:12:36.1721914Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T11:12:36.1722008Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T11:12:36.1722098Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T11:12:36.1722186Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T11:12:36.1722272Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T11:12:36.1722354Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T11:12:36.1722433Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T11:12:36.1722516Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T11:12:36.1722595Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T11:12:36.1722676Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T11:12:36.1722757Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T11:12:36.1722836Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T11:12:36.1722914Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T11:12:36.1722995Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T11:12:36.1723075Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T11:12:36.1723194Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T11:12:36.1723277Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T11:12:36.1723356Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T11:12:36.1723465Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T11:12:36.1723543Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T11:12:36.1723622Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T11:12:36.1723703Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T11:12:36.1723782Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T11:12:36.1723862Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T11:12:36.1723945Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T11:12:36.1724023Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T11:12:36.1724101Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T11:12:36.1724181Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T11:12:36.1724260Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T11:12:36.1724337Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T11:12:36.1724418Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T11:12:36.1724496Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T11:12:36.1724575Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T11:12:36.1724660Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T11:12:36.1724737Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T11:12:36.1724819Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T11:12:36.1724899Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T11:12:36.1724977Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T11:12:36.1725058Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T11:12:36.1725136Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T11:12:36.1725214Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T11:12:36.1725296Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T11:12:36.1725377Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T11:12:36.1725458Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T11:12:36.1725539Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T11:12:36.1725618Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T11:12:36.1725696Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T11:12:36.1725777Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T11:12:36.1725856Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T11:12:36.1725959Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T11:12:36.1726040Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T11:12:36.1726119Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T11:12:36.1726225Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T11:12:36.1726307Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T11:12:36.1726388Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T11:12:36.1726472Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T11:12:36.1726549Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T11:12:36.1726628Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T11:12:36.1726711Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T11:12:36.1726790Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T11:12:36.1726870Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T11:12:36.1726953Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T11:12:36.1727034Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T11:12:36.1727112Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T11:12:36.1727195Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T11:12:36.1727273Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T11:12:36.1727352Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T11:12:36.1727435Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T11:12:36.1727516Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T11:12:36.1727594Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T11:12:36.1727674Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T11:12:36.1727752Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T11:12:36.1727831Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T11:12:36.1727914Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T11:12:36.1727994Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T11:12:36.1728073Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T11:12:36.1728144Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T11:12:36.1728216Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T11:12:36.1728286Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T11:12:36.1728352Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T11:12:36.1728417Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T11:12:36.1728485Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T11:12:36.1728553Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T11:12:36.1728649Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T11:12:36.1728724Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T11:12:36.1728791Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T11:12:36.1728886Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T11:12:36.1728954Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T11:12:36.1729021Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T11:12:36.1729086Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T11:12:36.1729155Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T11:12:36.1729222Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T11:12:36.1729289Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T11:12:36.1729368Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T11:12:36.1729442Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T11:12:36.1729516Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T11:12:36.1729602Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T11:12:36.1729684Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T11:12:36.1729805Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T11:12:36.1729891Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T11:12:36.1729974Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T11:12:36.1730060Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T11:12:36.1730141Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T11:12:36.1730225Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T11:12:36.1730311Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T11:12:36.1730394Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T11:12:36.1730476Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T11:12:36.1730557Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T11:12:36.1730638Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T11:12:36.1730719Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T11:12:36.1730802Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T11:12:36.1730882Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T11:12:36.1730964Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T11:12:36.1731045Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T11:12:36.1731124Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T11:12:36.1731206Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T11:12:36.1731287Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T11:12:36.1731366Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T11:12:36.1731489Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T11:12:36.1731570Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T11:12:36.1731693Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T11:12:36.1731775Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T11:12:36.1731854Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T11:12:36.1731934Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T11:12:36.1732015Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T11:12:36.1732094Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T11:12:36.1732176Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T11:12:36.1732258Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T11:12:36.1732338Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T11:12:36.1732420Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T11:12:36.1732500Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T11:12:36.1732580Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T11:12:36.1732659Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T11:12:36.1732740Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T11:12:36.1732821Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T11:12:36.1732901Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T11:12:36.1732981Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T11:12:36.1733062Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T11:12:36.1733142Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T11:12:36.1733222Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T11:12:36.1733302Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T11:12:36.1733381Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T11:12:36.1733458Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T11:12:36.1733536Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T11:12:36.1733613Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T11:12:36.1733690Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T11:12:36.1733769Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T11:12:36.1733848Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T11:12:36.1733918Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T11:12:36.1733986Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T11:12:36.1734057Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T11:12:36.1734124Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T11:12:36.1734218Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T11:12:36.1734292Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T11:12:36.1734380Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T11:12:36.1734447Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T11:12:36.1734513Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T11:12:36.1734580Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T11:12:36.1734647Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T11:12:36.1734713Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T11:12:36.1734778Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T11:12:36.1734871Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T11:12:36.1734958Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T11:12:36.1735044Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T11:12:36.1735129Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T11:12:36.1735212Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T11:12:36.1735295Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T11:12:36.1735377Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T11:12:36.1735458Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T11:12:36.1735538Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T11:12:36.1735617Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T11:12:36.1735697Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T11:12:36.1735775Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T11:12:36.1735853Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T11:12:36.1735930Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T11:12:36.1736008Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T11:12:36.1736087Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T11:12:36.1736167Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T11:12:36.1736247Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T11:12:36.1736324Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T11:12:36.1736404Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T11:12:36.1736483Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T11:12:36.1736560Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T11:12:36.1736637Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T11:12:36.1736715Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T11:12:36.1736793Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T11:12:36.1736896Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T11:12:36.1736980Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T11:12:36.1737082Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T11:12:36.1737163Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T11:12:36.1737248Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T11:12:36.1737326Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T11:12:36.1737407Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T11:12:36.1737484Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T11:12:36.1737561Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T11:12:36.1737645Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T11:12:36.1737726Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T11:12:36.1737809Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T11:12:36.1737895Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T11:12:36.1737975Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T11:12:36.1738056Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T11:12:36.1738141Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T11:12:36.1738221Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T11:12:36.1738303Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T11:12:36.1738388Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T11:12:36.1738470Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T11:12:36.1738549Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T11:12:36.1738631Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T11:12:36.1738711Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T11:12:36.1738789Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T11:12:36.1738869Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T11:12:36.1738947Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T11:12:36.1739027Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T11:12:36.1739103Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T11:12:36.1739180Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T11:12:36.1739262Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T11:12:36.1739338Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T11:12:36.1739414Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T11:12:36.1739493Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T11:12:36.1739569Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T11:12:36.1739669Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T11:12:36.1739791Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T11:12:36.1739870Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T11:12:36.1739987Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T11:12:36.1740067Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T11:12:36.1740142Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T11:12:36.1740218Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T11:12:36.1740298Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T11:12:36.1740375Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T11:12:36.1740452Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T11:12:36.1740531Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T11:12:36.1740608Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T11:12:36.1740688Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T11:12:36.1740762Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T11:12:36.1740838Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T11:12:36.1740918Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T11:12:36.1740993Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T11:12:36.1741070Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T11:12:36.1741150Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T11:12:36.1741225Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T11:12:36.1741305Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T11:12:36.1741385Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T11:12:36.1741460Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T11:12:36.1741535Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T11:12:36.1741615Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T11:12:36.1741691Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T11:12:36.1741769Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T11:12:36.1741849Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T11:12:36.1741925Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T11:12:36.1742002Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T11:12:36.1742084Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T11:12:36.1742160Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T11:12:36.1742239Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T11:12:36.1742315Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T11:12:36.1742390Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T11:12:36.1742513Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T11:12:36.1742589Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T11:12:36.1742696Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T11:12:36.1742776Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T11:12:36.1742851Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T11:12:36.1742926Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T11:12:36.1743006Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T11:12:36.1743082Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T11:12:36.1743158Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T11:12:36.1743244Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T11:12:36.1743322Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T11:12:36.1743411Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T11:12:36.1743498Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T11:12:36.1743583Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T11:12:36.1743667Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T11:12:36.1743754Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T11:12:36.1743836Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T11:12:36.1743920Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T11:12:36.1744006Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T11:12:36.1744088Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T11:12:36.1744175Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T11:12:36.1744257Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T11:12:36.1744339Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T11:12:36.1744425Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T11:12:36.1744507Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T11:12:36.1744589Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T11:12:36.1744675Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T11:12:36.1744756Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T11:12:36.1744840Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T11:12:36.1744925Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T11:12:36.1745006Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T11:12:36.1745088Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T11:12:36.1745173Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T11:12:36.1745255Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T11:12:36.1745368Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T11:12:36.1745454Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T11:12:36.1745559Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T11:12:36.1745640Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T11:12:36.1745717Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T11:12:36.1745795Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T11:12:36.1745875Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T11:12:36.1745952Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T11:12:36.1746032Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T11:12:36.1746113Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T11:12:36.1746189Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T11:12:36.1746269Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T11:12:36.1746351Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T11:12:36.1746428Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T11:12:36.1746505Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T11:12:36.1746587Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T11:12:36.1746665Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T11:12:36.1746744Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T11:12:36.1746825Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T11:12:36.1746904Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T11:12:36.1746981Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T11:12:36.1747062Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T11:12:36.1747139Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T11:12:36.1747220Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T11:12:36.1747296Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T11:12:36.1747373Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T11:12:36.1747455Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T11:12:36.1747532Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T11:12:36.1747609Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T11:12:36.1747690Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T11:12:36.1747767Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T11:12:36.1747843Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T11:12:36.1747924Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T11:12:36.1748001Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T11:12:36.1748104Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T11:12:36.1748186Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T11:12:36.1748263Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T11:12:36.1748362Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T11:12:36.1748444Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T11:12:36.1748521Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T11:12:36.1748601Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T11:12:36.1748678Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T11:12:36.1748754Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T11:12:36.1748837Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T11:12:36.1748914Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T11:12:36.1748991Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T11:12:36.1749071Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T11:12:36.1749147Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T11:12:36.1749223Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T11:12:36.1749304Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T11:12:36.1749380Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T11:12:36.1749458Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T11:12:36.1749540Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T11:12:36.1749616Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T11:12:36.1749740Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T11:12:36.1749823Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T11:12:36.1749900Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T11:12:36.1749976Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T11:12:36.1750057Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T11:12:36.1750134Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T11:12:36.1750216Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T11:12:36.1750293Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T11:12:36.1750371Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T11:12:36.1750453Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T11:12:36.1750530Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T11:12:36.1750608Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T11:12:36.1750688Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T11:12:36.1750764Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T11:12:36.1750841Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T11:12:36.1750964Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T11:12:36.1751041Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T11:12:36.1751159Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T11:12:36.1751240Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T11:12:36.1751318Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T11:12:36.1751393Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T11:12:36.1751475Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T11:12:36.1751552Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T11:12:36.1751635Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T11:12:36.1751712Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T11:12:36.1751787Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T11:12:36.1751869Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T11:12:36.1751941Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T11:12:36.1752013Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T11:12:36.1752088Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T11:12:36.1752158Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T11:12:36.1752229Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T11:12:36.1752306Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T11:12:36.1752376Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T11:12:36.1752447Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T11:12:36.1752522Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T11:12:36.1752592Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T11:12:36.1752665Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T11:12:36.1752739Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T11:12:36.1752810Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T11:12:36.1752882Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T11:12:36.1752957Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T11:12:36.1753027Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T11:12:36.1753102Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T11:12:36.1753173Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T11:12:36.1753244Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T11:12:36.1753319Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T11:12:36.1753392Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T11:12:36.1753463Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T11:12:36.1753538Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T11:12:36.1753637Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T11:12:36.1753709Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T11:12:36.1753811Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T11:12:36.1753881Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T11:12:36.1753951Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T11:12:36.1754026Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T11:12:36.1754096Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T11:12:36.1754167Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T11:12:36.1754242Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T11:12:36.1754313Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T11:12:36.1754384Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T11:12:36.1754460Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T11:12:36.1754532Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T11:12:36.1754603Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T11:12:36.1754678Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T11:12:36.1754748Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T11:12:36.1754823Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T11:12:36.1754893Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T11:12:36.1754965Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T11:12:36.1755042Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T11:12:36.1755114Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T11:12:36.1755185Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T11:12:36.1755260Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T11:12:36.1755331Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T11:12:36.1755402Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T11:12:36.1755478Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T11:12:36.1755550Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T11:12:36.1755622Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T11:12:36.1755697Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T11:12:36.1755769Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T11:12:36.1755839Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T11:12:36.1755915Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T11:12:36.1755986Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T11:12:36.1756072Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T11:12:36.1756162Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T11:12:36.1756272Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T11:12:36.1756357Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T11:12:36.1756439Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T11:12:36.1756829Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T11:12:36.1756915Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T11:12:36.1756994Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T11:12:36.1757074Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T11:12:36.1757152Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T11:12:36.1757228Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T11:12:36.1757307Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T11:12:36.1757385Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T11:12:36.1757459Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T11:12:36.1757534Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T11:12:36.1757611Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T11:12:36.1757685Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T11:12:36.1757758Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T11:12:36.1757836Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T11:12:36.1757911Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T11:12:36.1757985Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T11:12:36.1758063Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T11:12:36.1758138Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T11:12:36.1758216Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T11:12:36.1758290Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T11:12:36.1758364Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T11:12:36.1758441Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T11:12:36.1758515Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T11:12:36.1758591Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T11:12:36.1758670Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T11:12:36.1758744Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T11:12:36.1758821Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T11:12:36.1758899Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T11:12:36.1758972Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T11:12:36.1759046Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T11:12:36.1759125Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T11:12:36.1759201Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T11:12:36.1759299Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T11:12:36.1759377Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T11:12:36.1759473Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T11:12:36.1759547Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T11:12:36.1759624Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T11:12:36.1759826Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T11:12:36.1759905Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T11:12:36.1759978Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T11:12:36.1760051Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T11:12:36.1760128Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T11:12:36.1760201Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T11:12:36.1760277Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T11:12:36.1760355Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T11:12:36.1760428Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T11:12:36.1760503Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T11:12:36.1760581Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T11:12:36.1760655Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T11:12:36.1760730Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T11:12:36.1760807Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T11:12:36.1760881Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T11:12:36.1760957Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T11:12:36.1761034Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T11:12:36.1771870Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T11:12:36.1771979Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T11:12:36.1772062Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T11:12:36.1772139Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T11:12:36.1772221Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T11:12:36.1772299Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T11:12:36.1772377Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T11:12:36.1772455Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T11:12:36.1772531Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T11:12:36.1772607Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T11:12:36.1772680Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T11:12:36.1772753Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T11:12:36.1772825Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T11:12:36.1772966Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T11:12:36.1773038Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T11:12:36.1773163Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T11:12:36.1773243Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T11:12:36.1773316Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T11:12:36.1773389Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T11:12:36.1773463Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T11:12:36.1773535Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T11:12:36.1773611Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T11:12:36.1773684Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T11:12:36.1773757Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T11:12:36.1773830Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T11:12:36.1773903Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T11:12:36.1773975Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T11:12:36.1774046Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T11:12:36.1774121Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T11:12:36.1774194Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T11:12:36.1774272Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T11:12:36.1774346Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T11:12:36.1774419Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T11:12:36.1774499Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T11:12:36.1774573Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T11:12:36.1774646Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T11:12:36.1774724Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T11:12:36.1774798Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T11:12:36.1774872Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T11:12:36.1774951Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T11:12:36.1775025Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T11:12:36.1775102Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T11:12:36.1775179Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T11:12:36.1775252Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T11:12:36.1775324Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T11:12:36.1775398Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T11:12:36.1775471Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T11:12:36.1775584Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T11:12:36.1775664Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T11:12:36.1775739Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T11:12:36.1775844Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T11:12:36.1775917Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T11:12:36.1775992Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T11:12:36.1776074Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T11:12:36.1776147Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T11:12:36.1776220Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T11:12:36.1776299Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T11:12:36.1776373Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T11:12:36.1776447Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T11:12:36.1776529Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T11:12:36.1776602Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T11:12:36.1776674Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T11:12:36.1776751Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T11:12:36.1776824Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T11:12:36.1776897Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T11:12:36.1776977Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T11:12:36.1777049Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T11:12:36.1777122Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T11:12:36.1777197Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T11:12:36.1777271Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T11:12:36.1777345Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T11:12:36.1777415Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T11:12:36.1777485Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T11:12:36.1777559Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T11:12:36.1777632Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T11:12:36.1777703Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T11:12:36.1777778Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T11:12:36.1777849Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T11:12:36.1777919Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T11:12:36.1777990Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T11:12:36.1778064Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T11:12:36.1778142Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T11:12:36.1778240Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T11:12:36.1778318Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T11:12:36.1778391Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T11:12:36.1778489Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T11:12:36.1778563Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T11:12:36.1778635Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T11:12:36.1778706Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T11:12:36.1778780Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T11:12:36.1778851Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T11:12:36.1778922Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T11:12:36.1778996Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T11:12:36.1779066Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T11:12:36.1779139Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T11:12:36.1779213Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T11:12:36.1779284Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T11:12:36.1779355Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T11:12:36.1779429Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T11:12:36.1779500Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T11:12:36.1779571Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T11:12:36.1779643Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T11:12:36.1779756Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T11:12:36.1779834Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T11:12:36.1779906Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T11:12:36.1779980Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T11:12:36.1780054Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T11:12:36.1780125Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T11:12:36.1780196Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T11:12:36.1780274Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T11:12:36.1780344Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T11:12:36.1780419Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T11:12:36.1780492Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T11:12:36.1780562Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T11:12:36.1780631Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T11:12:36.1780701Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T11:12:36.1780770Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T11:12:36.1780843Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T11:12:36.1780953Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T11:12:36.1781025Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T11:12:36.1781138Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T11:12:36.1781210Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T11:12:36.1781280Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T11:12:36.1781351Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T11:12:36.1781421Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T11:12:36.1781492Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T11:12:36.1781567Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T11:12:36.1781637Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T11:12:36.1781708Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T11:12:36.1781782Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T11:12:36.1781853Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T11:12:36.1781926Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T11:12:36.1782000Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T11:12:36.1782071Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T11:12:36.1782143Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T11:12:36.1782220Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T11:12:36.1782293Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T11:12:36.1782368Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T11:12:36.1782451Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T11:12:36.1782523Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T11:12:36.1782594Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T11:12:36.1782668Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T11:12:36.1782740Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T11:12:36.1782813Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T11:12:36.1782889Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T11:12:36.1782960Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T11:12:36.1783035Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T11:12:36.1783235Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T11:12:36.1783307Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T11:12:36.1783383Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T11:12:36.1783454Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T11:12:36.1783526Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T11:12:36.1783599Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T11:12:36.1783705Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T11:12:36.1783780Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T11:12:36.1783936Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T11:12:36.1784007Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T11:12:36.1784080Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T11:12:36.1784154Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T11:12:36.1784223Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T11:12:36.1784290Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T11:12:36.1784356Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T11:12:36.1784424Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T11:12:36.1784490Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T11:12:36.1784564Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T11:12:36.1784631Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T11:12:36.1784699Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T11:12:36.1784784Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T11:12:36.1784864Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T11:12:36.1784941Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T11:12:36.1785019Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T11:12:36.1785093Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T11:12:36.1785167Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T11:12:36.1785246Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T11:12:36.1785320Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T11:12:36.1785395Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T11:12:36.1785469Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T11:12:36.1785543Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T11:12:36.1785618Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T11:12:36.1785694Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T11:12:36.1785770Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T11:12:36.1785849Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T11:12:36.1785923Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T11:12:36.1785996Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T11:12:36.1786073Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T11:12:36.1786147Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T11:12:36.1786222Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T11:12:36.1786296Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T11:12:36.1786416Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T11:12:36.1786492Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T11:12:36.1786591Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T11:12:36.1786665Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T11:12:36.1786741Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T11:12:36.1786815Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T11:12:36.1786888Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T11:12:36.1786962Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T11:12:36.1787039Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T11:12:36.1787115Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T11:12:36.1787191Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T11:12:36.1787267Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T11:12:36.1787341Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T11:12:36.1787415Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T11:12:36.1787489Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T11:12:36.1787565Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T11:12:36.1787639Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T11:12:36.1787714Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T11:12:36.1787790Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T11:12:36.1787867Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T11:12:36.1787942Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T11:12:36.1788017Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T11:12:36.1788091Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T11:12:36.1788165Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T11:12:36.1788241Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T11:12:36.1788317Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T11:12:36.1788391Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T11:12:36.1788468Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T11:12:36.1788543Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T11:12:36.1788618Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T11:12:36.1788694Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T11:12:36.1788769Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T11:12:36.1788843Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T11:12:36.1788919Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T11:12:36.1789022Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T11:12:36.1789099Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T11:12:36.1789193Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T11:12:36.1789268Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T11:12:36.1789344Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T11:12:36.1789418Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T11:12:36.1789493Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T11:12:36.1789570Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T11:12:36.1789645Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T11:12:36.1789761Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T11:12:36.1789842Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T11:12:36.1789920Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T11:12:36.1789994Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T11:12:36.1790070Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T11:12:36.1790145Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T11:12:36.1790220Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T11:12:36.1790297Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T11:12:36.1790373Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T11:12:36.1790448Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T11:12:36.1790524Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T11:12:36.1790601Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T11:12:36.1790677Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T11:12:36.1790752Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T11:12:36.1790827Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T11:12:36.1790904Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T11:12:36.1790978Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T11:12:36.1791055Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T11:12:36.1791130Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T11:12:36.1791205Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T11:12:36.1791279Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T11:12:36.1791354Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T11:12:36.1791427Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T11:12:36.1791501Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T11:12:36.1791579Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T11:12:36.1791699Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T11:12:36.1791773Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T11:12:36.1791851Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T11:12:36.1791964Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T11:12:36.1792040Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T11:12:36.1792114Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T11:12:36.1792189Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T11:12:36.1792263Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T11:12:36.1792337Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T11:12:36.1792415Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T11:12:36.1792490Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T11:12:36.1792565Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T11:12:36.1792640Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T11:12:36.1792715Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T11:12:36.1792789Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T11:12:36.1792863Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T11:12:36.1792938Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T11:12:36.1793012Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T11:12:36.1793088Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T11:12:36.1793165Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T11:12:36.1793241Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T11:12:36.1793314Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T11:12:36.1793391Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T11:12:36.1793466Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T11:12:36.1793541Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T11:12:36.1793615Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T11:12:36.1793690Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T11:12:36.1793766Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T11:12:36.1793839Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T11:12:36.1793914Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T11:12:36.1793993Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T11:12:36.1794068Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T11:12:36.1794142Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T11:12:36.1794219Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T11:12:36.1794292Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T11:12:36.1794393Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T11:12:36.1794469Z * [new branch] gh/guangyey/256/base -> origin/gh/guangyey/256/base 2025-12-04T11:12:36.1794575Z * [new branch] gh/guangyey/256/head -> origin/gh/guangyey/256/head 2025-12-04T11:12:36.1794649Z * [new branch] gh/guangyey/256/orig -> origin/gh/guangyey/256/orig 2025-12-04T11:12:36.1794747Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T11:12:36.1794838Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T11:12:36.1794927Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T11:12:36.1795016Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T11:12:36.1795105Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T11:12:36.1795195Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T11:12:36.1795283Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T11:12:36.1795373Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T11:12:36.1795463Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T11:12:36.1795550Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T11:12:36.1795638Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T11:12:36.1795728Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T11:12:36.1795819Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T11:12:36.1795906Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T11:12:36.1795994Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T11:12:36.1796083Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T11:12:36.1796171Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T11:12:36.1796261Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T11:12:36.1796348Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T11:12:36.1796437Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T11:12:36.1796528Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T11:12:36.1796616Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T11:12:36.1796706Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T11:12:36.1796796Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T11:12:36.1796883Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T11:12:36.1796974Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T11:12:36.1797060Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T11:12:36.1797148Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T11:12:36.1797239Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T11:12:36.1797350Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T11:12:36.1797436Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T11:12:36.1797547Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T11:12:36.1797632Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T11:12:36.1797718Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T11:12:36.1797805Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T11:12:36.1797891Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T11:12:36.1797980Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T11:12:36.1798068Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T11:12:36.1798155Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T11:12:36.1798245Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T11:12:36.1798333Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T11:12:36.1798419Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T11:12:36.1798509Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T11:12:36.1798595Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T11:12:36.1798682Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T11:12:36.1798772Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T11:12:36.1798858Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T11:12:36.1798947Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T11:12:36.1799037Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T11:12:36.1799129Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T11:12:36.1799218Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T11:12:36.1799305Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T11:12:36.1799391Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T11:12:36.1799479Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T11:12:36.1799566Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T11:12:36.1799655Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T11:12:36.1799780Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T11:12:36.1799869Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T11:12:36.1799956Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T11:12:36.1800043Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T11:12:36.1800129Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T11:12:36.1800267Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T11:12:36.1800355Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T11:12:36.1800476Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T11:12:36.1800562Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T11:12:36.1800649Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T11:12:36.1800735Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T11:12:36.1800822Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T11:12:36.1800908Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T11:12:36.1800999Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T11:12:36.1801085Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T11:12:36.1801171Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T11:12:36.1801259Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T11:12:36.1801346Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T11:12:36.1801433Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T11:12:36.1801519Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T11:12:36.1801606Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T11:12:36.1801693Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T11:12:36.1801779Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T11:12:36.1801867Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T11:12:36.1801954Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T11:12:36.1802040Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T11:12:36.1802127Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T11:12:36.1802213Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T11:12:36.1802300Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T11:12:36.1802388Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T11:12:36.1802474Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T11:12:36.1802561Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T11:12:36.1802650Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T11:12:36.1802736Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T11:12:36.1802825Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T11:12:36.1802912Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T11:12:36.1802999Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T11:12:36.1803087Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T11:12:36.1803193Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T11:12:36.1803275Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T11:12:36.1803377Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T11:12:36.1803455Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T11:12:36.1803533Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T11:12:36.1803612Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T11:12:36.1803690Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T11:12:36.1803769Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T11:12:36.1803849Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T11:12:36.1803928Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T11:12:36.1804005Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T11:12:36.1804077Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T11:12:36.1804147Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T11:12:36.1804217Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T11:12:36.1804286Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T11:12:36.1804355Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T11:12:36.1804427Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T11:12:36.1804497Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T11:12:36.1804572Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T11:12:36.1804647Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T11:12:36.1804719Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T11:12:36.1804790Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T11:12:36.1804862Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T11:12:36.1804931Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T11:12:36.1805001Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T11:12:36.1805073Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T11:12:36.1805144Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T11:12:36.1805215Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T11:12:36.1805287Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T11:12:36.1805356Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T11:12:36.1805429Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T11:12:36.1805500Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T11:12:36.1805571Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T11:12:36.1805650Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T11:12:36.1805760Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T11:12:36.1805837Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T11:12:36.1805914Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T11:12:36.1806014Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T11:12:36.1806088Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T11:12:36.1806162Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T11:12:36.1806236Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T11:12:36.1806310Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T11:12:36.1806384Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T11:12:36.1806458Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T11:12:36.1806531Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T11:12:36.1806608Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T11:12:36.1806682Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T11:12:36.1806759Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T11:12:36.1806834Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T11:12:36.1806908Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T11:12:36.1806984Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T11:12:36.1807059Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T11:12:36.1807135Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T11:12:36.1807211Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T11:12:36.1807287Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T11:12:36.1807362Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T11:12:36.1807435Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T11:12:36.1807509Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T11:12:36.1807582Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T11:12:36.1807658Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T11:12:36.1807733Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T11:12:36.1807806Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T11:12:36.1807883Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T11:12:36.1807961Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T11:12:36.1808034Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T11:12:36.1808108Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T11:12:36.1808182Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T11:12:36.1808258Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T11:12:36.1808330Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T11:12:36.1808428Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T11:12:36.1808504Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T11:12:36.1808608Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T11:12:36.1808681Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T11:12:36.1808758Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T11:12:36.1808833Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T11:12:36.1808905Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T11:12:36.1808981Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T11:12:36.1809055Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T11:12:36.1809130Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T11:12:36.1809207Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T11:12:36.1809282Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T11:12:36.1809356Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T11:12:36.1809431Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T11:12:36.1809505Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T11:12:36.1809579Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T11:12:36.1809656Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T11:12:36.1809765Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T11:12:36.1809843Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T11:12:36.1809917Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T11:12:36.1809995Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T11:12:36.1810071Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T11:12:36.1810144Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T11:12:36.1810218Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T11:12:36.1810293Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T11:12:36.1810366Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T11:12:36.1810442Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T11:12:36.1810516Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T11:12:36.1810591Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T11:12:36.1810665Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T11:12:36.1810740Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T11:12:36.1810813Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T11:12:36.1810888Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T11:12:36.1810964Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T11:12:36.1811089Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T11:12:36.1811162Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T11:12:36.1811239Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T11:12:36.1811352Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T11:12:36.1811427Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T11:12:36.1811501Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T11:12:36.1811575Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T11:12:36.1811650Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T11:12:36.1811723Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T11:12:36.1811799Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T11:12:36.1811873Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T11:12:36.1811947Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T11:12:36.1812022Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T11:12:36.1812095Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T11:12:36.1812169Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T11:12:36.1812243Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T11:12:36.1812319Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T11:12:36.1812393Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T11:12:36.1812466Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T11:12:36.1812543Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T11:12:36.1812618Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T11:12:36.1812694Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T11:12:36.1812768Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T11:12:36.1812841Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T11:12:36.1812917Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T11:12:36.1812989Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T11:12:36.1813062Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T11:12:36.1813137Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T11:12:36.1813211Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T11:12:36.1813286Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T11:12:36.1813363Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T11:12:36.1813436Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T11:12:36.1813508Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T11:12:36.1813583Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T11:12:36.1813655Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T11:12:36.1813754Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T11:12:36.1813830Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T11:12:36.1813925Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T11:12:36.1813998Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T11:12:36.1814071Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T11:12:36.1814142Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T11:12:36.1814215Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T11:12:36.1814287Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T11:12:36.1814358Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T11:12:36.1814432Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T11:12:36.1814503Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T11:12:36.1814576Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T11:12:36.1814650Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T11:12:36.1814721Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T11:12:36.1814792Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T11:12:36.1814866Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T11:12:36.1814935Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T11:12:36.1815008Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T11:12:36.1815082Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T11:12:36.1815155Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T11:12:36.1815229Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T11:12:36.1815301Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T11:12:36.1815372Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T11:12:36.1815441Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T11:12:36.1815513Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T11:12:36.1815583Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T11:12:36.1815658Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T11:12:36.1815730Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T11:12:36.1815799Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T11:12:36.1815875Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T11:12:36.1815947Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T11:12:36.1816018Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T11:12:36.1816091Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T11:12:36.1816163Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T11:12:36.1816234Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T11:12:36.1816342Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T11:12:36.1816414Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T11:12:36.1816506Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T11:12:36.1816580Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T11:12:36.1816651Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T11:12:36.1816722Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T11:12:36.1816796Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T11:12:36.1816867Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T11:12:36.1816937Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T11:12:36.1817010Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T11:12:36.1817081Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T11:12:36.1817153Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T11:12:36.1817225Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T11:12:36.1817296Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T11:12:36.1817368Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T11:12:36.1817439Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T11:12:36.1817509Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T11:12:36.1817584Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T11:12:36.1817655Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T11:12:36.1817724Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T11:12:36.1817801Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T11:12:36.1817872Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T11:12:36.1817942Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T11:12:36.1818014Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T11:12:36.1818085Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T11:12:36.1818156Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T11:12:36.1818230Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T11:12:36.1818302Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T11:12:36.1818371Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T11:12:36.1818446Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T11:12:36.1818517Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T11:12:36.1818587Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T11:12:36.1818659Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T11:12:36.1818730Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T11:12:36.1818804Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T11:12:36.1818904Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T11:12:36.1818975Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T11:12:36.1819077Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T11:12:36.1819148Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T11:12:36.1819219Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T11:12:36.1819292Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T11:12:36.1819362Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T11:12:36.1819432Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T11:12:36.1819502Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T11:12:36.1819576Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T11:12:36.1819646Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T11:12:36.1819772Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T11:12:36.1819845Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T11:12:36.1819918Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T11:12:36.1819990Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T11:12:36.1820060Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T11:12:36.1820148Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T11:12:36.1820236Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T11:12:36.1820318Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T11:12:36.1820400Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T11:12:36.1820482Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T11:12:36.1820563Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T11:12:36.1820642Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T11:12:36.1820716Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T11:12:36.1820790Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T11:12:36.1820867Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T11:12:36.1820943Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T11:12:36.1821018Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T11:12:36.1821094Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T11:12:36.1821169Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T11:12:36.1821244Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T11:12:36.1821321Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T11:12:36.1821396Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T11:12:36.1821470Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T11:12:36.1821587Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T11:12:36.1821661Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T11:12:36.1821737Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T11:12:36.1821855Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T11:12:36.1821931Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T11:12:36.1822007Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T11:12:36.1822080Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T11:12:36.1822155Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T11:12:36.1822231Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T11:12:36.1822309Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T11:12:36.1822382Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T11:12:36.1822457Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T11:12:36.1822534Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T11:12:36.1822607Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T11:12:36.1822682Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T11:12:36.1822757Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T11:12:36.1822831Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T11:12:36.1822908Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T11:12:36.1822984Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T11:12:36.1823058Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T11:12:36.1823134Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T11:12:36.1823209Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T11:12:36.1823285Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T11:12:36.1823360Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T11:12:36.1823435Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T11:12:36.1823515Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T11:12:36.1823590Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T11:12:36.1823665Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T11:12:36.1823741Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T11:12:36.1823817Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T11:12:36.1823890Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T11:12:36.1823967Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T11:12:36.1824042Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T11:12:36.1824117Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T11:12:36.1824194Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T11:12:36.1824295Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T11:12:36.1824378Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T11:12:36.1824492Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T11:12:36.1824567Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T11:12:36.1824643Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T11:12:36.1824716Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T11:12:36.1824787Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T11:12:36.1824859Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T11:12:36.1824930Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T11:12:36.1825016Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T11:12:36.1825095Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T11:12:36.1825175Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T11:12:36.1825252Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T11:12:36.1825329Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T11:12:36.1825404Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T11:12:36.1825479Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T11:12:36.1825558Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T11:12:36.1825634Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T11:12:36.1825711Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T11:12:36.1825789Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T11:12:36.1825868Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T11:12:36.1825946Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T11:12:36.1826026Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T11:12:36.1826102Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T11:12:36.1826178Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T11:12:36.1826257Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T11:12:36.1826334Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T11:12:36.1826413Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T11:12:36.1826490Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T11:12:36.1826568Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T11:12:36.1826645Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T11:12:36.1826722Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T11:12:36.1826799Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T11:12:36.1826877Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T11:12:36.1826979Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T11:12:36.1827057Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T11:12:36.1827135Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T11:12:36.1827234Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T11:12:36.1827312Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T11:12:36.1827389Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T11:12:36.1827466Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T11:12:36.1827544Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T11:12:36.1827621Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T11:12:36.1827698Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T11:12:36.1827778Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T11:12:36.1827856Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T11:12:36.1827932Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T11:12:36.1828009Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T11:12:36.1828086Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T11:12:36.1828163Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T11:12:36.1828242Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T11:12:36.1828322Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T11:12:36.1828399Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T11:12:36.1828478Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T11:12:36.1828556Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T11:12:36.1828631Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T11:12:36.1828710Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T11:12:36.1828785Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T11:12:36.1828862Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T11:12:36.1828942Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T11:12:36.1829020Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T11:12:36.1829099Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T11:12:36.1829178Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T11:12:36.1829252Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T11:12:36.1829325Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T11:12:36.1829395Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T11:12:36.1829465Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T11:12:36.1829538Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T11:12:36.1829607Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T11:12:36.1829771Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T11:12:36.1829860Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T11:12:36.1829981Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T11:12:36.1830062Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T11:12:36.1830142Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T11:12:36.1830221Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T11:12:36.1830300Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T11:12:36.1830381Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T11:12:36.1830462Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T11:12:36.1830541Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T11:12:36.1830622Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T11:12:36.1830703Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T11:12:36.1830786Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T11:12:36.1830866Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T11:12:36.1830943Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T11:12:36.1831023Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T11:12:36.1831101Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T11:12:36.1831184Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T11:12:36.1831264Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T11:12:36.1831344Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T11:12:36.1831423Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T11:12:36.1831504Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T11:12:36.1831582Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T11:12:36.1831661Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T11:12:36.1831738Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T11:12:36.1831815Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T11:12:36.1831890Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T11:12:36.1831968Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T11:12:36.1832042Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T11:12:36.1832116Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T11:12:36.1832193Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T11:12:36.1832268Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T11:12:36.1832342Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T11:12:36.1832416Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T11:12:36.1832514Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T11:12:36.1832589Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T11:12:36.1832687Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T11:12:36.1832759Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T11:12:36.1832833Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T11:12:36.1832907Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T11:12:36.1832979Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T11:12:36.1833056Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T11:12:36.1833129Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T11:12:36.1833202Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T11:12:36.1833277Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T11:12:36.1833351Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T11:12:36.1833424Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T11:12:36.1833499Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T11:12:36.1833571Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T11:12:36.1833644Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T11:12:36.1833720Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T11:12:36.1833795Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T11:12:36.1833870Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T11:12:36.1833945Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T11:12:36.1834019Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T11:12:36.1834092Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T11:12:36.1834165Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T11:12:36.1834237Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T11:12:36.1834312Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T11:12:36.1834386Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T11:12:36.1834459Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T11:12:36.1834532Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T11:12:36.1834610Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T11:12:36.1834682Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T11:12:36.1834757Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T11:12:36.1834830Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T11:12:36.1834903Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T11:12:36.1834977Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T11:12:36.1835076Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T11:12:36.1835151Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T11:12:36.1835224Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T11:12:36.1835328Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T11:12:36.1835403Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T11:12:36.1835476Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T11:12:36.1835549Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T11:12:36.1835624Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T11:12:36.1835695Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T11:12:36.1835769Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T11:12:36.1835843Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T11:12:36.1835917Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T11:12:36.1835989Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T11:12:36.1836063Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T11:12:36.1836134Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T11:12:36.1836207Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T11:12:36.1836281Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T11:12:36.1836357Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T11:12:36.1836429Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T11:12:36.1836506Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T11:12:36.1836581Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T11:12:36.1836655Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T11:12:36.1836728Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T11:12:36.1836799Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T11:12:36.1836873Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T11:12:36.1836946Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T11:12:36.1837021Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T11:12:36.1837094Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T11:12:36.1837169Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T11:12:36.1837242Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T11:12:36.1837318Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T11:12:36.1837390Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T11:12:36.1837463Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T11:12:36.1837538Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T11:12:36.1837641Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T11:12:36.1837716Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T11:12:36.1837791Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T11:12:36.1837885Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T11:12:36.1837958Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T11:12:36.1838032Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T11:12:36.1838106Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T11:12:36.1838181Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T11:12:36.1838254Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T11:12:36.1838328Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T11:12:36.1838402Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T11:12:36.1838475Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T11:12:36.1838550Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T11:12:36.1838623Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T11:12:36.1838698Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T11:12:36.1838771Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T11:12:36.1838844Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T11:12:36.1838916Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T11:12:36.1838992Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T11:12:36.1839067Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T11:12:36.1839141Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T11:12:36.1839214Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T11:12:36.1839290Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T11:12:36.1839363Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T11:12:36.1839438Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T11:12:36.1839510Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T11:12:36.1839583Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T11:12:36.1839655Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T11:12:36.1839775Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T11:12:36.1839852Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T11:12:36.1839925Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T11:12:36.1840006Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T11:12:36.1840085Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T11:12:36.1840166Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T11:12:36.1840244Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T11:12:36.1840374Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T11:12:36.1840454Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T11:12:36.1840575Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T11:12:36.1840653Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T11:12:36.1840732Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T11:12:36.1840808Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T11:12:36.1840887Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T11:12:36.1840964Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T11:12:36.1841045Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T11:12:36.1841125Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T11:12:36.1841203Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T11:12:36.1841282Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T11:12:36.1841362Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T11:12:36.1841440Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T11:12:36.1841518Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T11:12:36.1841600Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T11:12:36.1841681Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T11:12:36.1841760Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T11:12:36.1841839Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T11:12:36.1841918Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T11:12:36.1841996Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T11:12:36.1842076Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T11:12:36.1842154Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T11:12:36.1842232Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T11:12:36.1842314Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T11:12:36.1842394Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T11:12:36.1842472Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T11:12:36.1842550Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T11:12:36.1842629Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T11:12:36.1842709Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T11:12:36.1842787Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T11:12:36.1842865Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T11:12:36.1842945Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T11:12:36.1843023Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T11:12:36.1843124Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T11:12:36.1843204Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T11:12:36.1843311Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T11:12:36.1843389Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T11:12:36.1843467Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T11:12:36.1843546Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T11:12:36.1843624Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T11:12:36.1843703Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T11:12:36.1843783Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T11:12:36.1843862Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T11:12:36.1843941Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T11:12:36.1844022Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T11:12:36.1844102Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T11:12:36.1844179Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T11:12:36.1844255Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T11:12:36.1844330Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T11:12:36.1844403Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T11:12:36.1844477Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T11:12:36.1844557Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T11:12:36.1844637Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T11:12:36.1844705Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T11:12:36.1844775Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T11:12:36.1844841Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T11:12:36.1844905Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T11:12:36.1844970Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T11:12:36.1845033Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T11:12:36.1845097Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T11:12:36.1845163Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T11:12:36.1845226Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T11:12:36.1845299Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T11:12:36.1845375Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T11:12:36.1845447Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T11:12:36.1845521Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T11:12:36.1845591Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T11:12:36.1845663Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T11:12:36.1845769Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T11:12:36.1845841Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T11:12:36.1845937Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T11:12:36.1846008Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T11:12:36.1846080Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T11:12:36.1846150Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T11:12:36.1846225Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T11:12:36.1846296Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T11:12:36.1846370Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T11:12:36.1846443Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T11:12:36.1846513Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T11:12:36.1846587Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T11:12:36.1846661Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T11:12:36.1846732Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T11:12:36.1846804Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T11:12:36.1846876Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T11:12:36.1846947Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T11:12:36.1847021Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T11:12:36.1847092Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T11:12:36.1847164Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T11:12:36.1847238Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T11:12:36.1847310Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T11:12:36.1847380Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T11:12:36.1847452Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T11:12:36.1847523Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T11:12:36.1847592Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T11:12:36.1847667Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T11:12:36.1847739Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T11:12:36.1847812Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T11:12:36.1847885Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T11:12:36.1847956Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T11:12:36.1848027Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T11:12:36.1848101Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T11:12:36.1848172Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T11:12:36.1848242Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T11:12:36.1848346Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T11:12:36.1848417Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T11:12:36.1848511Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T11:12:36.1848583Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T11:12:36.1848653Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T11:12:36.1848727Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T11:12:36.1848798Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T11:12:36.1848869Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T11:12:36.1848944Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T11:12:36.1849015Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T11:12:36.1849086Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T11:12:36.1849161Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T11:12:36.1849231Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T11:12:36.1849302Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T11:12:36.1849373Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T11:12:36.1849444Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T11:12:36.1849516Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T11:12:36.1849590Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T11:12:36.1849661Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T11:12:36.1849783Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T11:12:36.1849858Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T11:12:36.1849929Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T11:12:36.1850000Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T11:12:36.1850072Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T11:12:36.1850145Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T11:12:36.1850219Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T11:12:36.1850292Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T11:12:36.1850363Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T11:12:36.1850436Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T11:12:36.1850508Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T11:12:36.1850582Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T11:12:36.1850655Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T11:12:36.1850728Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T11:12:36.1850799Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T11:12:36.1850873Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T11:12:36.1850984Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T11:12:36.1851056Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T11:12:36.1851175Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T11:12:36.1851247Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T11:12:36.1851319Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T11:12:36.1851393Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T11:12:36.1851463Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T11:12:36.1851532Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T11:12:36.1851607Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T11:12:36.1851678Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T11:12:36.1851750Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T11:12:36.1851824Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T11:12:36.1851896Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T11:12:36.1851969Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T11:12:36.1852040Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T11:12:36.1852111Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T11:12:36.1852183Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T11:12:36.1852256Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T11:12:36.1852326Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T11:12:36.1852399Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T11:12:36.1852472Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T11:12:36.1852544Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T11:12:36.1852621Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T11:12:36.1852692Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T11:12:36.1852762Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T11:12:36.1852836Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T11:12:36.1852909Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T11:12:36.1852982Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T11:12:36.1853056Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T11:12:36.1853127Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T11:12:36.1853200Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T11:12:36.1853271Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T11:12:36.1853343Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T11:12:36.1853417Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T11:12:36.1853488Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T11:12:36.1854092Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T11:12:36.1854168Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T11:12:36.1854263Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T11:12:36.1854336Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T11:12:36.1854408Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T11:12:36.1854502Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T11:12:36.1854592Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T11:12:36.1854682Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T11:12:36.1854755Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T11:12:36.1854834Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T11:12:36.1854913Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T11:12:36.1854990Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T11:12:36.1855065Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T11:12:36.1855142Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T11:12:36.1855216Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T11:12:36.1855291Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T11:12:36.1855367Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T11:12:36.1855442Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T11:12:36.1855521Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T11:12:36.1855596Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T11:12:36.1855669Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T11:12:36.1855743Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T11:12:36.1855817Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T11:12:36.1855890Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T11:12:36.1855966Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T11:12:36.1856039Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T11:12:36.1856140Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T11:12:36.1856237Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T11:12:36.1856331Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T11:12:36.1856423Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T11:12:36.1856517Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T11:12:36.1856609Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T11:12:36.1856704Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T11:12:36.1856797Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T11:12:36.1856915Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T11:12:36.1857008Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T11:12:36.1857119Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T11:12:36.1857210Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T11:12:36.1857301Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T11:12:36.1857392Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T11:12:36.1857484Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T11:12:36.1857576Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T11:12:36.1857669Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T11:12:36.1857761Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T11:12:36.1857857Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T11:12:36.1857949Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T11:12:36.1858040Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T11:12:36.1858133Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T11:12:36.1858224Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T11:12:36.1858317Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T11:12:36.1858411Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T11:12:36.1858502Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T11:12:36.1858598Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T11:12:36.1858690Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T11:12:36.1858781Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T11:12:36.1858874Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T11:12:36.1858964Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T11:12:36.1859055Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T11:12:36.1859149Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T11:12:36.1859241Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T11:12:36.1859333Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T11:12:36.1859426Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T11:12:36.1859518Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T11:12:36.1859610Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T11:12:36.1859754Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T11:12:36.1859847Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T11:12:36.1859973Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T11:12:36.1860064Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T11:12:36.1860184Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T11:12:36.1860276Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T11:12:36.1860366Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T11:12:36.1860458Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T11:12:36.1860551Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T11:12:36.1860642Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T11:12:36.1860737Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T11:12:36.1860829Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T11:12:36.1860922Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T11:12:36.1861013Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T11:12:36.1861106Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T11:12:36.1861197Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T11:12:36.1861290Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T11:12:36.1861381Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T11:12:36.1861474Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T11:12:36.1861566Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T11:12:36.1861659Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T11:12:36.1861750Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T11:12:36.1861843Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T11:12:36.1861935Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T11:12:36.1862028Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T11:12:36.1862121Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T11:12:36.1862213Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T11:12:36.1862306Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T11:12:36.1862402Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T11:12:36.1862495Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T11:12:36.1862590Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T11:12:36.1862681Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T11:12:36.1862772Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T11:12:36.1862865Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T11:12:36.1862976Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T11:12:36.1863067Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T11:12:36.1863179Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T11:12:36.1863272Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T11:12:36.1863364Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T11:12:36.1863457Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T11:12:36.1863548Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T11:12:36.1863640Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T11:12:36.1863733Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T11:12:36.1863825Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T11:12:36.1863917Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T11:12:36.1864013Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T11:12:36.1864104Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T11:12:36.1864198Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T11:12:36.1864288Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T11:12:36.1864378Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T11:12:36.1864472Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T11:12:36.1864564Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T11:12:36.1864656Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T11:12:36.1864750Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T11:12:36.1864842Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T11:12:36.1864934Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T11:12:36.1865026Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T11:12:36.1865118Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T11:12:36.1865211Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T11:12:36.1865305Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T11:12:36.1865398Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T11:12:36.1865492Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T11:12:36.1865584Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T11:12:36.1865675Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T11:12:36.1865767Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T11:12:36.1865858Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T11:12:36.1865968Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T11:12:36.1866059Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T11:12:36.1866178Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T11:12:36.1866268Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T11:12:36.1866360Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T11:12:36.1866450Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T11:12:36.1866540Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T11:12:36.1866632Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T11:12:36.1866725Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T11:12:36.1866815Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T11:12:36.1866907Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T11:12:36.1866997Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T11:12:36.1867089Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T11:12:36.1867180Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T11:12:36.1867272Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T11:12:36.1867366Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T11:12:36.1867458Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T11:12:36.1867549Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T11:12:36.1867644Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T11:12:36.1867734Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T11:12:36.1867825Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T11:12:36.1867920Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T11:12:36.1868012Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T11:12:36.1868103Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T11:12:36.1868197Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T11:12:36.1868288Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T11:12:36.1868384Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T11:12:36.1868475Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T11:12:36.1868567Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T11:12:36.1868660Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T11:12:36.1868752Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T11:12:36.1868845Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T11:12:36.1868957Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T11:12:36.1869048Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T11:12:36.1869159Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T11:12:36.1869252Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T11:12:36.1869343Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T11:12:36.1869432Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T11:12:36.1869525Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T11:12:36.1869615Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T11:12:36.1869734Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T11:12:36.1869831Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T11:12:36.1869923Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T11:12:36.1870017Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T11:12:36.1870107Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T11:12:36.1870198Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T11:12:36.1870290Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T11:12:36.1870364Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T11:12:36.1870438Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T11:12:36.1870511Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T11:12:36.1870585Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T11:12:36.1870655Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T11:12:36.1870726Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T11:12:36.1870796Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T11:12:36.1870866Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T11:12:36.1870940Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T11:12:36.1871011Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T11:12:36.1871083Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T11:12:36.1871156Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T11:12:36.1871226Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T11:12:36.1871298Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T11:12:36.1871368Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T11:12:36.1871439Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T11:12:36.1871510Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T11:12:36.1871581Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T11:12:36.1871676Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T11:12:36.1871749Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T11:12:36.1871819Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T11:12:36.1871922Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T11:12:36.1871995Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T11:12:36.1872064Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T11:12:36.1872133Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T11:12:36.1872205Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T11:12:36.1872276Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T11:12:36.1872347Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T11:12:36.1872419Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T11:12:36.1872488Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T11:12:36.1872559Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T11:12:36.1872632Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T11:12:36.1872702Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T11:12:36.1872772Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T11:12:36.1872841Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T11:12:36.1872910Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T11:12:36.1872984Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T11:12:36.1873054Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T11:12:36.1873125Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T11:12:36.1873199Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T11:12:36.1873269Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T11:12:36.1873341Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T11:12:36.1873414Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T11:12:36.1873484Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T11:12:36.1873556Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T11:12:36.1873631Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T11:12:36.1873702Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T11:12:36.1873775Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T11:12:36.1873847Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T11:12:36.1873917Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T11:12:36.1873988Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T11:12:36.1874061Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T11:12:36.1874131Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T11:12:36.1874200Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T11:12:36.1874301Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T11:12:36.1874373Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T11:12:36.1874472Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T11:12:36.1874542Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T11:12:36.1874613Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T11:12:36.1874683Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T11:12:36.1874753Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T11:12:36.1874826Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T11:12:36.1874897Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T11:12:36.1874969Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T11:12:36.1875040Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T11:12:36.1875114Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T11:12:36.1875182Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T11:12:36.1875252Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T11:12:36.1875324Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T11:12:36.1875395Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T11:12:36.1875465Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T11:12:36.1875539Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T11:12:36.1875608Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T11:12:36.1875677Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T11:12:36.1875750Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T11:12:36.1875819Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T11:12:36.1875891Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T11:12:36.1875960Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T11:12:36.1876030Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T11:12:36.1876101Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T11:12:36.1876172Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T11:12:36.1876243Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T11:12:36.1876314Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T11:12:36.1876387Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T11:12:36.1876458Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T11:12:36.1876531Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T11:12:36.1876601Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T11:12:36.1876670Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T11:12:36.1876744Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T11:12:36.1876838Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T11:12:36.1876908Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T11:12:36.1876979Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T11:12:36.1877075Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T11:12:36.1877147Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T11:12:36.1877221Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T11:12:36.1877299Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T11:12:36.1877378Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T11:12:36.1877453Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T11:12:36.1877544Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T11:12:36.1877633Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T11:12:36.1877719Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T11:12:36.1877803Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T11:12:36.1877890Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T11:12:36.1877973Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T11:12:36.1878056Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T11:12:36.1878141Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T11:12:36.1878225Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T11:12:36.1878310Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T11:12:36.1878396Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T11:12:36.1878483Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T11:12:36.1878566Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T11:12:36.1878649Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T11:12:36.1878733Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T11:12:36.1878820Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T11:12:36.1878906Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T11:12:36.1878989Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T11:12:36.1879076Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T11:12:36.1879161Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T11:12:36.1879245Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T11:12:36.1879329Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T11:12:36.1879413Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T11:12:36.1879496Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T11:12:36.1879581Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T11:12:36.1879683Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T11:12:36.1879807Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T11:12:36.1879919Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T11:12:36.1879994Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T11:12:36.1880069Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T11:12:36.1880149Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T11:12:36.1880225Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T11:12:36.1880300Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T11:12:36.1880376Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T11:12:36.1880451Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T11:12:36.1880529Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T11:12:36.1880605Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T11:12:36.1880679Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T11:12:36.1880755Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T11:12:36.1880829Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T11:12:36.1880903Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T11:12:36.1880979Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T11:12:36.1881055Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T11:12:36.1881128Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T11:12:36.1881206Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T11:12:36.1881280Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T11:12:36.1881354Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T11:12:36.1881430Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T11:12:36.1881503Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T11:12:36.1881578Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T11:12:36.1881657Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T11:12:36.1881733Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T11:12:36.1881809Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T11:12:36.1881884Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T11:12:36.1881958Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T11:12:36.1882033Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T11:12:36.1882106Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T11:12:36.1882181Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T11:12:36.1882257Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T11:12:36.1882372Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T11:12:36.1882445Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T11:12:36.1882519Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T11:12:36.1882610Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T11:12:36.1882684Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T11:12:36.1882758Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T11:12:36.1882832Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T11:12:36.1882906Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T11:12:36.1882982Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T11:12:36.1883059Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T11:12:36.1883132Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T11:12:36.1883207Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T11:12:36.1883278Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T11:12:36.1883349Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T11:12:36.1883418Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T11:12:36.1883487Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T11:12:36.1883561Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T11:12:36.1883630Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T11:12:36.1883700Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T11:12:36.1883773Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T11:12:36.1883843Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T11:12:36.1883914Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T11:12:36.1883985Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T11:12:36.1884055Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T11:12:36.1884123Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T11:12:36.1884195Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T11:12:36.1884266Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T11:12:36.1884338Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T11:12:36.1884412Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T11:12:36.1884482Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T11:12:36.1884551Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T11:12:36.1884621Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T11:12:36.1884691Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T11:12:36.1884762Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T11:12:36.1884834Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T11:12:36.1884925Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T11:12:36.1884998Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T11:12:36.1885068Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T11:12:36.1885161Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T11:12:36.1885234Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T11:12:36.1885304Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T11:12:36.1885374Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T11:12:36.1885447Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T11:12:36.1885516Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T11:12:36.1885588Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T11:12:36.1885660Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T11:12:36.1885729Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T11:12:36.1885801Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T11:12:36.1885873Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T11:12:36.1885943Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T11:12:36.1886014Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T11:12:36.1886084Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T11:12:36.1886154Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T11:12:36.1886227Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T11:12:36.1886299Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T11:12:36.1886370Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T11:12:36.1886445Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T11:12:36.1886516Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T11:12:36.1886589Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T11:12:36.1886660Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T11:12:36.1886731Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T11:12:36.1886803Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T11:12:36.1886878Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T11:12:36.1886949Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T11:12:36.1887020Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T11:12:36.1887088Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T11:12:36.1887158Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T11:12:36.1887226Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T11:12:36.1887296Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T11:12:36.1887365Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T11:12:36.1887433Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T11:12:36.1887535Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T11:12:36.1887604Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T11:12:36.1887724Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T11:12:36.1887801Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T11:12:36.1887874Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T11:12:36.1887947Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T11:12:36.1888018Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T11:12:36.1888089Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T11:12:36.1888161Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T11:12:36.1888235Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T11:12:36.1888306Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T11:12:36.1888380Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T11:12:36.1888451Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T11:12:36.1888522Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T11:12:36.1888592Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T11:12:36.1888663Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T11:12:36.1888733Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T11:12:36.1888805Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T11:12:36.1888876Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T11:12:36.1888947Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T11:12:36.1889021Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T11:12:36.1889092Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T11:12:36.1889162Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T11:12:36.1889237Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T11:12:36.1889307Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T11:12:36.1889380Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T11:12:36.1889452Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T11:12:36.1889523Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T11:12:36.1889594Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T11:12:36.1889666Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T11:12:36.1889775Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T11:12:36.1889850Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T11:12:36.1889920Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T11:12:36.1889990Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T11:12:36.1890064Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T11:12:36.1890169Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T11:12:36.1890239Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T11:12:36.1890338Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T11:12:36.1890408Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T11:12:36.1890478Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T11:12:36.1890553Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T11:12:36.1890625Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T11:12:36.1890695Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T11:12:36.1890768Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T11:12:36.1890843Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T11:12:36.1890915Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T11:12:36.1890988Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T11:12:36.1891060Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T11:12:36.1891135Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T11:12:36.1891208Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T11:12:36.1891281Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T11:12:36.1891356Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T11:12:36.1891427Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T11:12:36.1891498Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T11:12:36.1891572Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T11:12:36.1891644Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T11:12:36.1891714Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T11:12:36.1891787Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T11:12:36.1891859Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T11:12:36.1897903Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T11:12:36.1897994Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T11:12:36.1898072Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T11:12:36.1898144Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T11:12:36.1898217Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T11:12:36.1898295Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T11:12:36.1898368Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T11:12:36.1898439Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T11:12:36.1898512Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T11:12:36.1898583Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T11:12:36.1898653Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T11:12:36.1898766Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T11:12:36.1898838Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T11:12:36.1898911Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T11:12:36.1899011Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T11:12:36.1899085Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T11:12:36.1899158Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T11:12:36.1899230Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T11:12:36.1899300Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T11:12:36.1899380Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T11:12:36.1899459Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T11:12:36.1899533Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T11:12:36.1899607Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T11:12:36.1899683Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T11:12:36.1899798Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T11:12:36.1899872Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T11:12:36.1899944Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T11:12:36.1900017Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T11:12:36.1900091Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T11:12:36.1900170Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T11:12:36.1900243Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T11:12:36.1900322Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T11:12:36.1900395Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T11:12:36.1900468Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T11:12:36.1900542Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T11:12:36.1900615Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T11:12:36.1900687Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T11:12:36.1900761Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T11:12:36.1900835Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T11:12:36.1900907Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T11:12:36.1900985Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T11:12:36.1901057Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T11:12:36.1901131Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T11:12:36.1901204Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T11:12:36.1901276Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T11:12:36.1901348Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T11:12:36.1901455Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T11:12:36.1901523Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T11:12:36.1901590Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T11:12:36.1901689Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T11:12:36.1901759Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T11:12:36.1901825Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T11:12:36.1901893Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T11:12:36.1901961Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T11:12:36.1902029Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T11:12:36.1902100Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T11:12:36.1902167Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T11:12:36.1902234Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T11:12:36.1902302Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T11:12:36.1902372Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T11:12:36.1902441Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T11:12:36.1902507Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T11:12:36.1902573Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T11:12:36.1902640Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T11:12:36.1902707Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T11:12:36.1902772Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T11:12:36.1902841Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T11:12:36.1902908Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T11:12:36.1902977Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T11:12:36.1903044Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T11:12:36.1903111Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T11:12:36.1903178Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T11:12:36.1903244Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T11:12:36.1903311Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T11:12:36.1903379Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T11:12:36.1903444Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T11:12:36.1903512Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T11:12:36.1903579Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T11:12:36.1903646Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T11:12:36.1903711Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T11:12:36.1903779Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T11:12:36.1903850Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T11:12:36.1903946Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T11:12:36.1904014Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T11:12:36.1904084Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T11:12:36.1904173Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T11:12:36.1904241Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T11:12:36.1904310Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T11:12:36.1904376Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T11:12:36.1904442Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T11:12:36.1904512Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T11:12:36.1904579Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T11:12:36.1904675Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T11:12:36.1904765Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T11:12:36.1904854Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T11:12:36.1904940Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T11:12:36.1905026Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T11:12:36.1905110Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T11:12:36.1905197Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T11:12:36.1905284Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T11:12:36.1905369Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T11:12:36.1905455Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T11:12:36.1905541Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T11:12:36.1905626Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T11:12:36.1905712Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T11:12:36.1905796Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T11:12:36.1905880Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T11:12:36.1905966Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T11:12:36.1906053Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T11:12:36.1906138Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T11:12:36.1906225Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T11:12:36.1906310Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T11:12:36.1906395Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T11:12:36.1906469Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T11:12:36.1906540Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T11:12:36.1906610Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T11:12:36.1906704Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T11:12:36.1906779Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T11:12:36.1906853Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T11:12:36.1906943Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T11:12:36.1907014Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T11:12:36.1907086Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T11:12:36.1907156Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T11:12:36.1907226Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T11:12:36.1907297Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T11:12:36.1907370Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T11:12:36.1907440Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T11:12:36.1907511Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T11:12:36.1907583Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T11:12:36.1907653Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T11:12:36.1907727Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T11:12:36.1907797Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T11:12:36.1907867Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T11:12:36.1907938Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T11:12:36.1908010Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T11:12:36.1908079Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T11:12:36.1908151Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T11:12:36.1908223Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T11:12:36.1908295Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T11:12:36.1908365Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T11:12:36.1908435Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T11:12:36.1908508Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T11:12:36.1908579Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T11:12:36.1908652Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T11:12:36.1908724Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T11:12:36.1908796Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T11:12:36.1908867Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T11:12:36.1908938Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T11:12:36.1909008Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T11:12:36.1909080Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T11:12:36.1909152Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T11:12:36.1909223Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T11:12:36.1909311Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T11:12:36.1909385Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T11:12:36.1909478Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T11:12:36.1909549Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T11:12:36.1909621Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T11:12:36.1909732Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T11:12:36.1909807Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T11:12:36.1909879Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T11:12:36.1909950Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T11:12:36.1910022Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T11:12:36.1910095Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T11:12:36.1910182Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T11:12:36.1910267Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T11:12:36.1910348Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T11:12:36.1910429Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T11:12:36.1910511Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T11:12:36.1910591Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T11:12:36.1910672Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T11:12:36.1910754Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T11:12:36.1910833Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T11:12:36.1910915Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T11:12:36.1910996Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T11:12:36.1911077Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T11:12:36.1911158Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T11:12:36.1911239Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T11:12:36.1911318Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T11:12:36.1911400Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T11:12:36.1911480Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T11:12:36.1911561Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T11:12:36.1911642Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T11:12:36.1911722Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T11:12:36.1911801Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T11:12:36.1911885Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T11:12:36.1911963Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T11:12:36.1912074Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T11:12:36.1912152Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T11:12:36.1912262Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T11:12:36.1912337Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T11:12:36.1912415Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T11:12:36.1912491Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T11:12:36.1912569Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T11:12:36.1912646Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T11:12:36.1912723Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T11:12:36.1912799Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T11:12:36.1912875Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T11:12:36.1912952Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T11:12:36.1913030Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T11:12:36.1913105Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T11:12:36.1913181Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T11:12:36.1913256Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T11:12:36.1913331Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T11:12:36.1913409Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T11:12:36.1913486Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T11:12:36.1913563Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T11:12:36.1913639Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T11:12:36.1913718Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T11:12:36.1913794Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T11:12:36.1913870Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T11:12:36.1913946Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T11:12:36.1914022Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T11:12:36.1914099Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T11:12:36.1914178Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T11:12:36.1914255Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T11:12:36.1914332Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T11:12:36.1914408Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T11:12:36.1914485Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T11:12:36.1914561Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T11:12:36.1914638Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T11:12:36.1914732Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T11:12:36.1914809Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T11:12:36.1914886Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T11:12:36.1914990Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T11:12:36.1915071Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T11:12:36.1915150Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T11:12:36.1915230Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T11:12:36.1915311Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T11:12:36.1915390Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T11:12:36.1915471Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T11:12:36.1915555Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T11:12:36.1915636Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T11:12:36.1915714Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T11:12:36.1915796Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T11:12:36.1915875Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T11:12:36.1915954Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T11:12:36.1916034Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T11:12:36.1916114Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T11:12:36.1916194Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T11:12:36.1916274Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T11:12:36.1916353Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T11:12:36.1916433Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T11:12:36.1916511Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T11:12:36.1916590Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T11:12:36.1916670Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T11:12:36.1916749Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T11:12:36.1916827Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T11:12:36.1916909Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T11:12:36.1916990Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T11:12:36.1917068Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T11:12:36.1917148Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T11:12:36.1917226Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T11:12:36.1917306Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T11:12:36.1917384Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T11:12:36.1917481Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T11:12:36.1917563Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T11:12:36.1917663Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T11:12:36.1917742Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T11:12:36.1917823Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T11:12:36.1917902Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T11:12:36.1917980Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T11:12:36.1918060Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T11:12:36.1918140Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T11:12:36.1918217Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T11:12:36.1918296Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T11:12:36.1918377Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T11:12:36.1918454Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T11:12:36.1918534Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T11:12:36.1918613Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T11:12:36.1918691Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T11:12:36.1918772Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T11:12:36.1918850Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T11:12:36.1918930Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T11:12:36.1919009Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T11:12:36.1919087Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T11:12:36.1919167Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T11:12:36.1919245Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T11:12:36.1919322Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T11:12:36.1919400Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T11:12:36.1919478Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T11:12:36.1919553Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T11:12:36.1919629Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T11:12:36.1919748Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T11:12:36.1919825Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T11:12:36.1919899Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T11:12:36.1919977Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T11:12:36.1920056Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T11:12:36.1920134Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T11:12:36.1920238Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T11:12:36.1920313Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T11:12:36.1920420Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T11:12:36.1920493Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T11:12:36.1920567Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T11:12:36.1920640Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T11:12:36.1920713Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T11:12:36.1920788Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T11:12:36.1920862Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T11:12:36.1920935Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T11:12:36.1921009Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T11:12:36.1921085Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T11:12:36.1921159Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T11:12:36.1921233Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T11:12:36.1921306Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T11:12:36.1921380Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T11:12:36.1921457Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T11:12:36.1921534Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T11:12:36.1921608Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T11:12:36.1921691Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T11:12:36.1921767Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T11:12:36.1921844Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T11:12:36.1921921Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T11:12:36.1921997Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T11:12:36.1922074Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T11:12:36.1922150Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T11:12:36.1922225Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T11:12:36.1922303Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T11:12:36.1922381Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T11:12:36.1922455Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T11:12:36.1922534Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T11:12:36.1922720Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T11:12:36.1922797Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T11:12:36.1922874Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T11:12:36.1922971Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T11:12:36.1923047Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T11:12:36.1923152Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T11:12:36.1923227Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T11:12:36.1923302Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T11:12:36.1923377Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T11:12:36.1923453Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T11:12:36.1923530Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T11:12:36.1923608Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T11:12:36.1923683Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T11:12:36.1923759Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T11:12:36.1923836Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T11:12:36.1923911Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T11:12:36.1923989Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T11:12:36.1924064Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T11:12:36.1924139Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T11:12:36.1924216Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T11:12:36.1924293Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T11:12:36.1924368Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T11:12:36.1924449Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T11:12:36.1924524Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T11:12:36.1924600Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T11:12:36.1924676Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T11:12:36.1924751Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T11:12:36.1924828Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T11:12:36.1924904Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T11:12:36.1924980Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T11:12:36.1925057Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T11:12:36.1925135Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T11:12:36.1925212Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T11:12:36.1925288Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T11:12:36.1925363Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T11:12:36.1925438Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T11:12:36.1925515Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T11:12:36.1925610Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T11:12:36.1925686Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T11:12:36.1925762Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T11:12:36.1925858Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T11:12:36.1925934Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T11:12:36.1926008Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T11:12:36.1926083Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T11:12:36.1926160Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T11:12:36.1926235Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T11:12:36.1926312Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T11:12:36.1926388Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T11:12:36.1926467Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T11:12:36.1926542Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T11:12:36.1926619Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T11:12:36.1926694Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T11:12:36.1926769Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T11:12:36.1926847Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T11:12:36.1926924Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T11:12:36.1927001Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T11:12:36.1927080Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T11:12:36.1927159Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T11:12:36.1927235Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T11:12:36.1927313Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T11:12:36.1927389Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T11:12:36.1927468Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T11:12:36.1927545Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T11:12:36.1927623Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T11:12:36.1927700Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T11:12:36.1927777Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T11:12:36.1927852Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T11:12:36.1927929Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T11:12:36.1928004Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T11:12:36.1928079Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T11:12:36.1928156Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T11:12:36.1928250Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T11:12:36.1928326Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T11:12:36.1928403Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T11:12:36.1928500Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T11:12:36.1928575Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T11:12:36.1928654Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T11:12:36.1928728Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T11:12:36.1928805Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T11:12:36.1928878Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T11:12:36.1928954Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T11:12:36.1929030Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T11:12:36.1929105Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T11:12:36.1929178Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T11:12:36.1929252Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T11:12:36.1929326Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T11:12:36.1929400Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T11:12:36.1929475Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T11:12:36.1929549Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T11:12:36.1929623Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T11:12:36.1929755Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T11:12:36.1929833Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T11:12:36.1929906Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T11:12:36.1929982Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T11:12:36.1930055Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T11:12:36.1930129Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T11:12:36.1930204Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T11:12:36.1930279Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T11:12:36.1930353Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T11:12:36.1930428Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T11:12:36.1930502Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T11:12:36.1930578Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T11:12:36.1930652Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T11:12:36.1930726Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T11:12:36.1930800Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T11:12:36.1930874Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T11:12:36.1930995Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T11:12:36.1931071Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T11:12:36.1931171Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T11:12:36.1931244Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T11:12:36.1931319Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T11:12:36.1931393Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T11:12:36.1931466Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T11:12:36.1931541Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T11:12:36.1931617Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T11:12:36.1931694Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T11:12:36.1931768Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T11:12:36.1931843Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T11:12:36.1931919Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T11:12:36.1931993Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T11:12:36.1932068Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T11:12:36.1932144Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T11:12:36.1932218Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T11:12:36.1932295Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T11:12:36.1932370Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T11:12:36.1932446Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T11:12:36.1932522Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T11:12:36.1932600Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T11:12:36.1932673Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T11:12:36.1932747Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T11:12:36.1932823Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T11:12:36.1932898Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T11:12:36.1932972Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T11:12:36.1933047Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T11:12:36.1933122Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T11:12:36.1933197Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T11:12:36.1933271Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T11:12:36.1933345Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T11:12:36.1933420Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T11:12:36.1933495Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T11:12:36.1933597Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T11:12:36.1933672Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T11:12:36.1933749Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T11:12:36.1933847Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T11:12:36.1933919Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T11:12:36.1933990Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T11:12:36.1934060Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T11:12:36.1934132Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T11:12:36.1934202Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T11:12:36.1934275Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T11:12:36.1934368Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T11:12:36.1934459Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T11:12:36.1934546Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T11:12:36.1934635Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T11:12:36.1934720Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T11:12:36.1934807Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T11:12:36.1934893Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T11:12:36.1934979Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T11:12:36.1935065Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T11:12:36.1935156Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T11:12:36.1935242Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T11:12:36.1935330Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T11:12:36.1935416Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T11:12:36.1935500Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T11:12:36.1935586Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T11:12:36.1935671Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T11:12:36.1935757Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T11:12:36.1935845Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T11:12:36.1935932Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T11:12:36.1936018Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T11:12:36.1936104Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T11:12:36.1936188Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T11:12:36.1936274Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T11:12:36.1936359Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T11:12:36.1936462Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T11:12:36.1936549Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T11:12:36.1936652Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T11:12:36.1936738Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T11:12:36.1936822Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T11:12:36.1936906Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T11:12:36.1936990Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T11:12:36.1937075Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T11:12:36.1937160Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T11:12:36.1937246Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T11:12:36.1937333Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T11:12:36.1937417Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T11:12:36.1937500Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T11:12:36.1937585Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T11:12:36.1937669Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T11:12:36.1937754Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T11:12:36.1937839Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T11:12:36.1937924Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T11:12:36.1938011Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T11:12:36.1938099Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T11:12:36.1938183Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T11:12:36.1938267Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T11:12:36.1938353Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T11:12:36.1938436Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T11:12:36.1938525Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T11:12:36.1938609Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T11:12:36.1938694Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T11:12:36.1938783Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T11:12:36.1938867Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T11:12:36.1938953Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T11:12:36.1939038Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T11:12:36.1939122Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T11:12:36.1939207Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T11:12:36.1939312Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T11:12:36.1939397Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T11:12:36.1939500Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T11:12:36.1939585Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T11:12:36.1939670Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T11:12:36.1939797Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T11:12:36.1939883Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T11:12:36.1939967Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T11:12:36.1940056Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T11:12:36.1940141Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T11:12:36.1940227Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T11:12:36.1940313Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T11:12:36.1940397Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T11:12:36.1940483Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T11:12:36.1940567Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T11:12:36.1940651Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T11:12:36.1940737Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T11:12:36.1940821Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T11:12:36.1940905Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T11:12:36.1940991Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T11:12:36.1941076Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T11:12:36.1941159Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T11:12:36.1941245Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T11:12:36.1941332Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T11:12:36.1941419Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T11:12:36.1941505Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T11:12:36.1941588Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T11:12:36.1941674Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T11:12:36.1941761Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T11:12:36.1941845Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T11:12:36.1941932Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T11:12:36.1942016Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T11:12:36.1942102Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T11:12:36.1942214Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T11:12:36.1942298Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T11:12:36.1942407Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T11:12:36.1942493Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T11:12:36.1942578Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T11:12:36.1942662Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T11:12:36.1942747Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T11:12:36.1942833Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T11:12:36.1942920Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T11:12:36.1943007Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T11:12:36.1943093Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T11:12:36.1943177Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T11:12:36.1943262Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T11:12:36.1943348Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T11:12:36.1943433Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T11:12:36.1943517Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T11:12:36.1943603Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T11:12:36.1943688Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T11:12:36.1943772Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T11:12:36.1943858Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T11:12:36.1943945Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T11:12:36.1944031Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T11:12:36.1944115Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T11:12:36.1944200Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T11:12:36.1944284Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T11:12:36.1944369Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T11:12:36.1944455Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T11:12:36.1944541Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T11:12:36.1944625Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T11:12:36.1944710Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T11:12:36.1944793Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T11:12:36.1944878Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T11:12:36.1944962Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T11:12:36.1945075Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T11:12:36.1945161Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T11:12:36.1945265Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T11:12:36.1945349Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T11:12:36.1945434Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T11:12:36.1945519Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T11:12:36.1945605Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T11:12:36.1945691Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T11:12:36.1945776Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T11:12:36.1945862Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T11:12:36.1945949Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T11:12:36.1946033Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T11:12:36.1946119Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T11:12:36.1946205Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T11:12:36.1946289Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T11:12:36.1946373Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T11:12:36.1946458Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T11:12:36.1946542Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T11:12:36.1946627Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T11:12:36.1946713Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T11:12:36.1946796Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T11:12:36.1946883Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T11:12:36.1946968Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T11:12:36.1947052Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T11:12:36.1947123Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T11:12:36.1947194Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T11:12:36.1947261Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T11:12:36.1947330Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T11:12:36.1947397Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T11:12:36.1947464Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T11:12:36.1947530Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T11:12:36.1947596Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T11:12:36.1947662Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T11:12:36.1947728Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T11:12:36.1947814Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T11:12:36.1947881Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T11:12:36.1947966Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T11:12:36.1948032Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T11:12:36.1948098Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T11:12:36.1948165Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T11:12:36.1948230Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T11:12:36.1948296Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T11:12:36.1948381Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T11:12:36.1948465Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T11:12:36.1948545Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T11:12:36.1948624Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T11:12:36.1948702Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T11:12:36.1948781Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T11:12:36.1948859Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T11:12:36.1948937Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T11:12:36.1949016Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T11:12:36.1949094Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T11:12:36.1949172Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T11:12:36.1949244Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T11:12:36.1949313Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T11:12:36.1949382Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T11:12:36.1949460Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T11:12:36.1949537Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T11:12:36.1949615Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T11:12:36.1949745Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T11:12:36.1949823Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T11:12:36.1949898Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T11:12:36.1949974Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T11:12:36.1950048Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T11:12:36.1950124Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T11:12:36.1950198Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T11:12:36.1950272Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T11:12:36.1950347Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T11:12:36.1950466Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T11:12:36.1950541Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T11:12:36.1950617Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T11:12:36.1950721Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T11:12:36.1950795Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T11:12:36.1950871Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T11:12:36.1950945Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T11:12:36.1951020Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T11:12:36.1951093Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T11:12:36.1951169Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T11:12:36.1951244Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T11:12:36.1951321Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T11:12:36.1951395Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T11:12:36.1951469Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T11:12:36.1951543Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T11:12:36.1951618Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T11:12:36.1951693Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T11:12:36.1951766Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T11:12:36.1951841Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T11:12:36.1951918Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T11:12:36.1951993Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T11:12:36.1952068Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T11:12:36.1952144Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T11:12:36.1952218Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T11:12:36.1952293Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T11:12:36.1952367Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T11:12:36.1952442Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T11:12:36.1952517Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T11:12:36.1952592Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T11:12:36.1952669Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T11:12:36.1952743Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T11:12:36.1952817Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T11:12:36.1952891Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T11:12:36.1952967Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T11:12:36.1953040Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T11:12:36.1953138Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T11:12:36.1953213Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T11:12:36.1953314Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T11:12:36.1953387Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T11:12:36.1953463Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T11:12:36.1953537Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T11:12:36.1953610Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T11:12:36.1953686Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T11:12:36.1953763Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T11:12:36.1953837Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T11:12:36.1953912Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T11:12:36.1953988Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T11:12:36.1954062Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T11:12:36.1954136Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T11:12:36.1954209Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T11:12:36.1954284Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T11:12:36.1954359Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T11:12:36.1954433Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T11:12:36.1954509Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T11:12:36.1954586Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T11:12:36.1954660Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T11:12:36.1954735Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T11:12:36.1954809Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T11:12:36.1954883Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T11:12:36.1954959Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T11:12:36.1955033Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T11:12:36.1955109Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T11:12:36.1955184Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T11:12:36.1955263Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T11:12:36.1955342Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T11:12:36.1955417Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T11:12:36.1955491Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T11:12:36.1955567Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T11:12:36.1955641Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T11:12:36.1955741Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T11:12:36.1955819Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T11:12:36.1955897Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T11:12:36.1956006Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T11:12:36.1956094Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T11:12:36.1956177Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T11:12:36.1956260Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T11:12:36.1956344Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T11:12:36.1956425Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T11:12:36.1956509Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T11:12:36.1956593Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T11:12:36.1956675Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T11:12:36.1956758Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T11:12:36.1956840Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T11:12:36.1956923Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T11:12:36.1957009Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T11:12:36.1957090Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T11:12:36.1957172Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T11:12:36.1957256Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T11:12:36.1957338Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T11:12:36.1957420Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T11:12:36.1957503Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T11:12:36.1957584Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T11:12:36.1957666Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T11:12:36.1957751Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T11:12:36.1957833Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T11:12:36.1957916Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T11:12:36.1957997Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T11:12:36.1958080Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T11:12:36.1958164Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T11:12:36.1958246Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T11:12:36.1958328Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T11:12:36.1958412Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T11:12:36.1958494Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T11:12:36.1958594Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T11:12:36.1958677Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T11:12:36.1958779Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T11:12:36.1958859Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T11:12:36.1958942Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T11:12:36.1959023Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T11:12:36.1959104Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T11:12:36.1959186Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T11:12:36.1959271Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T11:12:36.1959354Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T11:12:36.1959438Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T11:12:36.1959519Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T11:12:36.1959603Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T11:12:36.1959684Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T11:12:36.1959818Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T11:12:36.1959902Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T11:12:36.1959986Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T11:12:36.1960068Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T11:12:36.1960150Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T11:12:36.1960233Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T11:12:36.1960314Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T11:12:36.1960399Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T11:12:36.1960481Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T11:12:36.1960563Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T11:12:36.1960646Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T11:12:36.1960728Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T11:12:36.1960809Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T11:12:36.1960894Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T11:12:36.1960975Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T11:12:36.1961057Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T11:12:36.1961139Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T11:12:36.1961221Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T11:12:36.1961303Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T11:12:36.1961435Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T11:12:36.1961516Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T11:12:36.1961638Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T11:12:36.1961720Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T11:12:36.1961802Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T11:12:36.1961886Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T11:12:36.1961968Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T11:12:36.1962049Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T11:12:36.1962132Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T11:12:36.1962214Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T11:12:36.1962296Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T11:12:36.1962382Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T11:12:36.1962463Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T11:12:36.1962544Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T11:12:36.1962626Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T11:12:36.1962707Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T11:12:36.1962790Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T11:12:36.1962874Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T11:12:36.1962955Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T11:12:36.1963039Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T11:12:36.1963121Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T11:12:36.1963204Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T11:12:36.1963289Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T11:12:36.1963373Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T11:12:36.1963455Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T11:12:36.1963540Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T11:12:36.1963622Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T11:12:36.1963705Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T11:12:36.1963790Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T11:12:36.1963872Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T11:12:36.1963953Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T11:12:36.1964034Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T11:12:36.1964116Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T11:12:36.1964228Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T11:12:36.1964310Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T11:12:36.1964391Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T11:12:36.1964518Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T11:12:36.1964600Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T11:12:36.1964681Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T11:12:36.1964765Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T11:12:36.1964848Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T11:12:36.1964930Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T11:12:36.1965015Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T11:12:36.1965098Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T11:12:36.1965184Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T11:12:36.1965267Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T11:12:36.1965348Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T11:12:36.1965431Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T11:12:36.1965512Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T11:12:36.1965594Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T11:12:36.1965679Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T11:12:36.1965761Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T11:12:36.1965845Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T11:12:36.1965928Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T11:12:36.1966009Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T11:12:36.1966089Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T11:12:36.1966174Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T11:12:36.1966256Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T11:12:36.1966339Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T11:12:36.1966424Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T11:12:36.1966506Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T11:12:36.1966593Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T11:12:36.1966673Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T11:12:36.1966754Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T11:12:36.1966835Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T11:12:36.1966916Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T11:12:36.1967023Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T11:12:36.1967108Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T11:12:36.1967189Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T11:12:36.1967293Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T11:12:36.1967375Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T11:12:36.1967456Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T11:12:36.1967539Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T11:12:36.1967623Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T11:12:36.1967697Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T11:12:36.1967770Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T11:12:36.1967844Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T11:12:36.1967915Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T11:12:36.1967985Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T11:12:36.1968057Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T11:12:36.1968126Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T11:12:36.1968198Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T11:12:36.1968267Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T11:12:36.1968336Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T11:12:36.1968408Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T11:12:36.1968479Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T11:12:36.1968551Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T11:12:36.1968624Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T11:12:36.1968695Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T11:12:36.1968767Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T11:12:36.1968837Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T11:12:36.1968907Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T11:12:36.1968977Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T11:12:36.1969051Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T11:12:36.1969121Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T11:12:36.1969191Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T11:12:36.1969262Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T11:12:36.1969332Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T11:12:36.1969403Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T11:12:36.1969473Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T11:12:36.1969544Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T11:12:36.1969643Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T11:12:36.1969740Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T11:12:36.1969812Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T11:12:36.1969919Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T11:12:36.1970003Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T11:12:36.1970085Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T11:12:36.1970169Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T11:12:36.1970248Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T11:12:36.1970328Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T11:12:36.1970409Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T11:12:36.1970488Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T11:12:36.1970568Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T11:12:36.1970648Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T11:12:36.1970728Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T11:12:36.1970809Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T11:12:36.1970889Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T11:12:36.1970967Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T11:12:36.1971049Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T11:12:36.1971131Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T11:12:36.1971210Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T11:12:36.1971289Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T11:12:36.1971364Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T11:12:36.1971438Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T11:12:36.1971511Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T11:12:36.1971585Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T11:12:36.1971659Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T11:12:36.1971735Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T11:12:36.1971808Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T11:12:36.1971886Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T11:12:36.1971961Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T11:12:36.1972035Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T11:12:36.1972108Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T11:12:36.1972184Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T11:12:36.1972257Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T11:12:36.1972329Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T11:12:36.1972429Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T11:12:36.1972503Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T11:12:36.1972597Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T11:12:36.1972671Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T11:12:36.1972744Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T11:12:36.1972818Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T11:12:36.1972892Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T11:12:36.1972965Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T11:12:36.1973043Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T11:12:36.1973116Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T11:12:36.1973190Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T11:12:36.1973266Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T11:12:36.1973340Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T11:12:36.1973414Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T11:12:36.1973489Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T11:12:36.1973564Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T11:12:36.1973637Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T11:12:36.1973715Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T11:12:36.1973790Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T11:12:36.1973864Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T11:12:36.1973938Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T11:12:36.1974011Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T11:12:36.1974086Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T11:12:36.1974160Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T11:12:36.1974233Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T11:12:36.1974318Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T11:12:36.1974396Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T11:12:36.1974475Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T11:12:36.1974553Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T11:12:36.1974631Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T11:12:36.1974708Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T11:12:36.1974784Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T11:12:36.1974860Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T11:12:36.1974936Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T11:12:36.1975047Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T11:12:36.1975124Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T11:12:36.1975219Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T11:12:36.1975294Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T11:12:36.1975368Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T11:12:36.1975442Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T11:12:36.1975516Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T11:12:36.1975591Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T11:12:36.1975669Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T11:12:36.1975744Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T11:12:36.1975818Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T11:12:36.1975893Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T11:12:36.1975968Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T11:12:36.1976041Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T11:12:36.1976117Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T11:12:36.1976190Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T11:12:36.1976263Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T11:12:36.1976340Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T11:12:36.1976413Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T11:12:36.1976487Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T11:12:36.1976562Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T11:12:36.1976634Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T11:12:36.1976706Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T11:12:36.1976780Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T11:12:36.1976850Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T11:12:36.1976920Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T11:12:36.1976991Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T11:12:36.1977060Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T11:12:36.1977132Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T11:12:36.1977201Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T11:12:36.1977269Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T11:12:36.1977339Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T11:12:36.1977410Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T11:12:36.1977480Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T11:12:36.1977551Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T11:12:36.1977639Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T11:12:36.1977710Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T11:12:36.1977802Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T11:12:36.1977871Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T11:12:36.1977941Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T11:12:36.1978012Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T11:12:36.1978081Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T11:12:36.1978150Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T11:12:36.1978221Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T11:12:36.1978290Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T11:12:36.1978361Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T11:12:36.1978431Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T11:12:36.1978502Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T11:12:36.1978572Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T11:12:36.1978641Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T11:12:36.1978709Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T11:12:36.1978781Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T11:12:36.1978851Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T11:12:36.1978920Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T11:12:36.1978991Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T11:12:36.1979061Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T11:12:36.1979130Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T11:12:36.1979201Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T11:12:36.1979271Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T11:12:36.1979340Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T11:12:36.1979410Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T11:12:36.1979480Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T11:12:36.1979549Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T11:12:36.1979622Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T11:12:36.1979732Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T11:12:36.1979804Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T11:12:36.1979875Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T11:12:36.1979946Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T11:12:36.1980018Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T11:12:36.1980087Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T11:12:36.1980198Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T11:12:36.1980268Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T11:12:36.1980372Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T11:12:36.1980442Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T11:12:36.1980515Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T11:12:36.1980585Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T11:12:36.1980653Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T11:12:36.1980725Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T11:12:36.1980796Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T11:12:36.1980878Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T11:12:36.1980957Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T11:12:36.1981034Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T11:12:36.1981108Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T11:12:36.1981185Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T11:12:36.1981259Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T11:12:36.1981335Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T11:12:36.1981413Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T11:12:36.1981489Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T11:12:36.1981567Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T11:12:36.1981644Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T11:12:36.1981722Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T11:12:36.1981798Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T11:12:36.1981873Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T11:12:36.1981948Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T11:12:36.1982024Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T11:12:36.1982099Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T11:12:36.1982176Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T11:12:36.1982251Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T11:12:36.1982327Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T11:12:36.1982401Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T11:12:36.1982477Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T11:12:36.1982551Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T11:12:36.1982624Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T11:12:36.1982700Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T11:12:36.1982773Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T11:12:36.1982878Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T11:12:36.1982954Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T11:12:36.1983046Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T11:12:36.1983120Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T11:12:36.1983193Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T11:12:36.1983263Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T11:12:36.1983335Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T11:12:36.1983405Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T11:12:36.1983477Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T11:12:36.1983548Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T11:12:36.1983617Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T11:12:36.1983688Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T11:12:36.1983758Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T11:12:36.1983828Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T11:12:36.1983898Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T11:12:36.1983969Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T11:12:36.1984038Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T11:12:36.1984109Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T11:12:36.1984181Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T11:12:36.1984250Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T11:12:36.1984322Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T11:12:36.1984400Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T11:12:36.1984475Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T11:12:36.1984552Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T11:12:36.1984626Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T11:12:36.1984698Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T11:12:36.1984773Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T11:12:36.1984844Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T11:12:36.1984917Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T11:12:36.1984990Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T11:12:36.1985061Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T11:12:36.1985132Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T11:12:36.1985204Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T11:12:36.1985276Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T11:12:36.1985377Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T11:12:36.1985452Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T11:12:36.1985524Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T11:12:36.1985618Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T11:12:36.1985691Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T11:12:36.1985763Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T11:12:36.1985834Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T11:12:36.1985908Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T11:12:36.1985978Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T11:12:36.1986053Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T11:12:36.1986123Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T11:12:36.1986192Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T11:12:36.1986265Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T11:12:36.1986335Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T11:12:36.1986405Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T11:12:36.1986477Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T11:12:36.1986548Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T11:12:36.1986618Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T11:12:36.1986691Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T11:12:36.1986762Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T11:12:36.1986833Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T11:12:36.1986906Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T11:12:36.1986976Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T11:12:36.1987046Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T11:12:36.1987117Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T11:12:36.1987186Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T11:12:36.1987256Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T11:12:36.1987330Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T11:12:36.1987401Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T11:12:36.1987473Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T11:12:36.1987543Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T11:12:36.1987613Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T11:12:36.1987687Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T11:12:36.1987757Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T11:12:36.1987827Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T11:12:36.1987899Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T11:12:36.1987989Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T11:12:36.1988061Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T11:12:36.1988155Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T11:12:36.1988225Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T11:12:36.1988294Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T11:12:36.1988367Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T11:12:36.1988437Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T11:12:36.1988506Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T11:12:36.1988578Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T11:12:36.1988649Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T11:12:36.1988721Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T11:12:36.1988796Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T11:12:36.1988865Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T11:12:36.1988935Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T11:12:36.1989004Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T11:12:36.1989074Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T11:12:36.1989144Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T11:12:36.1989213Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T11:12:36.1989285Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T11:12:36.1989355Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T11:12:36.1989427Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T11:12:36.1989500Z * [new branch] google-main -> origin/google-main 2025-12-04T11:12:36.1989591Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T11:12:36.1989666Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T11:12:36.1989845Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T11:12:36.1989967Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T11:12:36.1990106Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T11:12:36.1990214Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T11:12:36.1990285Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T11:12:36.1990349Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T11:12:36.1990412Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T11:12:36.1990598Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T11:12:36.1990663Z * [new branch] inlining -> origin/inlining 2025-12-04T11:12:36.1990738Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T11:12:36.1990864Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T11:12:36.1991044Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T11:12:36.1991148Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T11:12:36.1991216Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T11:12:36.1991299Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T11:12:36.1991362Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T11:12:36.1991425Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T11:12:36.1991548Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T11:12:36.1991657Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T11:12:36.1991767Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T11:12:36.1991878Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T11:12:36.1991968Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T11:12:36.1992059Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T11:12:36.1992149Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T11:12:36.1992235Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T11:12:36.1992318Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T11:12:36.1992403Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T11:12:36.1992487Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T11:12:36.1992570Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T11:12:36.1992658Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T11:12:36.1992742Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T11:12:36.1992824Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T11:12:36.1992899Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T11:12:36.1992966Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T11:12:36.1993049Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T11:12:36.1993156Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T11:12:36.1993259Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T11:12:36.1993345Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T11:12:36.1993451Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T11:12:36.1993536Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T11:12:36.1993608Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T11:12:36.1993682Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T11:12:36.1993758Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T11:12:36.1993862Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T11:12:36.1993948Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T11:12:36.1994044Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T11:12:36.1994180Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T11:12:36.1994305Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T11:12:36.1994418Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T11:12:36.1994551Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T11:12:36.1994635Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T11:12:36.1994728Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T11:12:36.1994828Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T11:12:36.1994923Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T11:12:36.1995024Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T11:12:36.1995122Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T11:12:36.1995230Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T11:12:36.1995352Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T11:12:36.1995461Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T11:12:36.1995538Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T11:12:36.1995603Z * [new branch] main -> origin/main 2025-12-04T11:12:36.1995683Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T11:12:36.1995757Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T11:12:36.1995830Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T11:12:36.1995900Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T11:12:36.1995968Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T11:12:36.1996038Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T11:12:36.1996107Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T11:12:36.1996176Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T11:12:36.1996254Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T11:12:36.1996417Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T11:12:36.1996584Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T11:12:36.1996711Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T11:12:36.1996809Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T11:12:36.1996927Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T11:12:36.1997041Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T11:12:36.1997124Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T11:12:36.1997234Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T11:12:36.1997318Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T11:12:36.1997398Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T11:12:36.1997478Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T11:12:36.1997547Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T11:12:36.1997623Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T11:12:36.1997691Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T11:12:36.1997757Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T11:12:36.1997834Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T11:12:36.1997919Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T11:12:36.1998018Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T11:12:36.1998093Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T11:12:36.1998164Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T11:12:36.1998235Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T11:12:36.1998307Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T11:12:36.1998374Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T11:12:36.1998448Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T11:12:36.1998527Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T11:12:36.1998606Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T11:12:36.1998692Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T11:12:36.1998795Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T11:12:36.1998874Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T11:12:36.1998962Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T11:12:36.1999049Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T11:12:36.1999121Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T11:12:36.1999195Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T11:12:36.1999270Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T11:12:36.1999344Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T11:12:36.1999415Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T11:12:36.1999489Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T11:12:36.1999555Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T11:12:36.1999641Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T11:12:36.1999761Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T11:12:36.1999875Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T11:12:36.1999946Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T11:12:36.2000067Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T11:12:36.2000139Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T11:12:36.2000211Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T11:12:36.2000282Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T11:12:36.2000351Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T11:12:36.2000426Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T11:12:36.2000489Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T11:12:36.2000564Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T11:12:36.2000634Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T11:12:36.2000706Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T11:12:36.2000776Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T11:12:36.2000846Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T11:12:36.2000913Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T11:12:36.2000978Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T11:12:36.2001044Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T11:12:36.2001109Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T11:12:36.2001174Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T11:12:36.2001240Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T11:12:36.2001304Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T11:12:36.2001368Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T11:12:36.2001435Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T11:12:36.2001500Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T11:12:36.2001564Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T11:12:36.2001627Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T11:12:36.2001689Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T11:12:36.2001754Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T11:12:36.2001830Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T11:12:36.2001917Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T11:12:36.2001986Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T11:12:36.2002051Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T11:12:36.2002126Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T11:12:36.2002232Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T11:12:36.2002328Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T11:12:36.2002398Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T11:12:36.2002495Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T11:12:36.2002566Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T11:12:36.2002665Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T11:12:36.2002744Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T11:12:36.2002816Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T11:12:36.2002890Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T11:12:36.2002968Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T11:12:36.2003037Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T11:12:36.2003107Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T11:12:36.2003177Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T11:12:36.2003259Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T11:12:36.2003350Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T11:12:36.2003420Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T11:12:36.2003496Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T11:12:36.2003565Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T11:12:36.2003649Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T11:12:36.2003714Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T11:12:36.2003784Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T11:12:36.2003868Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T11:12:36.2003948Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T11:12:36.2004033Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T11:12:36.2004114Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T11:12:36.2004194Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T11:12:36.2004270Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T11:12:36.2004347Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T11:12:36.2004424Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T11:12:36.2004507Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T11:12:36.2004592Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T11:12:36.2004669Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T11:12:36.2004767Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T11:12:36.2004861Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T11:12:36.2004933Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T11:12:36.2005008Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T11:12:36.2005084Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T11:12:36.2005160Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T11:12:36.2005246Z * [new branch] module-shim -> origin/module-shim 2025-12-04T11:12:36.2005312Z * [new branch] move_config -> origin/move_config 2025-12-04T11:12:36.2005422Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T11:12:36.2005494Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T11:12:36.2005595Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T11:12:36.2005666Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T11:12:36.2005744Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T11:12:36.2005809Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T11:12:36.2005880Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T11:12:36.2005958Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T11:12:36.2006027Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T11:12:36.2006112Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T11:12:36.2006187Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T11:12:36.2006274Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T11:12:36.2006345Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T11:12:36.2006418Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T11:12:36.2006493Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T11:12:36.2006558Z * [new branch] nightly -> origin/nightly 2025-12-04T11:12:36.2006677Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T11:12:36.2006800Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T11:12:36.2006929Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T11:12:36.2007052Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T11:12:36.2007167Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T11:12:36.2007280Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T11:12:36.2007351Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T11:12:36.2007478Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T11:12:36.2007560Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T11:12:36.2007627Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T11:12:36.2007692Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T11:12:36.2007770Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T11:12:36.2007846Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T11:12:36.2007916Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T11:12:36.2007989Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T11:12:36.2008084Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T11:12:36.2008157Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T11:12:36.2008228Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T11:12:36.2008323Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T11:12:36.2008394Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T11:12:36.2008463Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T11:12:36.2008535Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T11:12:36.2008606Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T11:12:36.2008676Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T11:12:36.2008748Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T11:12:36.2008816Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T11:12:36.2008886Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T11:12:36.2008959Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T11:12:36.2009030Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T11:12:36.2009099Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T11:12:36.2009169Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T11:12:36.2009239Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T11:12:36.2009327Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T11:12:36.2009415Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T11:12:36.2009501Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T11:12:36.2009578Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T11:12:36.2009650Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T11:12:36.2009765Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T11:12:36.2009838Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T11:12:36.2009909Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T11:12:36.2009973Z * [new branch] pca2 -> origin/pca2 2025-12-04T11:12:36.2010049Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T11:12:36.2010116Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T11:12:36.2010182Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T11:12:36.2010259Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T11:12:36.2010349Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T11:12:36.2010459Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T11:12:36.2010564Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T11:12:36.2010650Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T11:12:36.2010741Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T11:12:36.2010896Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T11:12:36.2010994Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T11:12:36.2011140Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T11:12:36.2011222Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T11:12:36.2011309Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T11:12:36.2011424Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T11:12:36.2011513Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T11:12:36.2011610Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T11:12:36.2011702Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T11:12:36.2011791Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T11:12:36.2011880Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T11:12:36.2011967Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T11:12:36.2012072Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T11:12:36.2012158Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T11:12:36.2012245Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T11:12:36.2012340Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T11:12:36.2012440Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T11:12:36.2012538Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T11:12:36.2012663Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T11:12:36.2012771Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T11:12:36.2012864Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T11:12:36.2012978Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T11:12:36.2013072Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T11:12:36.2013176Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T11:12:36.2013262Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T11:12:36.2013349Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T11:12:36.2013432Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T11:12:36.2013535Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T11:12:36.2013644Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T11:12:36.2013760Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T11:12:36.2013845Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T11:12:36.2013982Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T11:12:36.2014086Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T11:12:36.2014192Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T11:12:36.2014275Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T11:12:36.2014389Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T11:12:36.2014487Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T11:12:36.2014569Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T11:12:36.2014652Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T11:12:36.2014748Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T11:12:36.2014843Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T11:12:36.2014923Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T11:12:36.2015007Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T11:12:36.2015099Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T11:12:36.2015178Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T11:12:36.2015252Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T11:12:36.2015322Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T11:12:36.2015385Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T11:12:36.2015461Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T11:12:36.2015529Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T11:12:36.2015601Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T11:12:36.2015686Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T11:12:36.2015815Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T11:12:36.2015953Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T11:12:36.2016039Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T11:12:36.2016115Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T11:12:36.2016216Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T11:12:36.2016284Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T11:12:36.2016353Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T11:12:36.2016422Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T11:12:36.2016486Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T11:12:36.2016550Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T11:12:36.2016619Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T11:12:36.2016682Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T11:12:36.2016746Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T11:12:36.2016838Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T11:12:36.2016901Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T11:12:36.2016964Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T11:12:36.2017050Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T11:12:36.2017113Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T11:12:36.2017177Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T11:12:36.2017239Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T11:12:36.2017301Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T11:12:36.2017364Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T11:12:36.2017428Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T11:12:36.2017491Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T11:12:36.2017555Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T11:12:36.2017619Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T11:12:36.2017687Z * [new branch] release_notes -> origin/release_notes 2025-12-04T11:12:36.2017768Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T11:12:36.2017893Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T11:12:36.2018015Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T11:12:36.2018136Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T11:12:36.2018253Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T11:12:36.2018382Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T11:12:36.2018500Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T11:12:36.2018601Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T11:12:36.2018702Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T11:12:36.2018871Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T11:12:36.2018968Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T11:12:36.2019068Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T11:12:36.2019138Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T11:12:36.2019235Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T11:12:36.2025079Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T11:12:36.2025202Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T11:12:36.2025304Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T11:12:36.2025410Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T11:12:36.2025502Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T11:12:36.2025703Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T11:12:36.2025796Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T11:12:36.2025917Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T11:12:36.2025982Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T11:12:36.2026045Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T11:12:36.2026115Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T11:12:36.2026182Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T11:12:36.2026350Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T11:12:36.2026457Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T11:12:36.2026573Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T11:12:36.2026636Z * [new branch] save -> origin/save 2025-12-04T11:12:36.2026701Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T11:12:36.2026767Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T11:12:36.2026831Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T11:12:36.2026939Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T11:12:36.2027020Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T11:12:36.2027109Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T11:12:36.2027187Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T11:12:36.2027270Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T11:12:36.2027360Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T11:12:36.2027446Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T11:12:36.2027524Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T11:12:36.2027588Z * [new branch] suo -> origin/suo 2025-12-04T11:12:36.2027652Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T11:12:36.2027716Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T11:12:36.2027812Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T11:12:36.2027885Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T11:12:36.2027959Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T11:12:36.2028035Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T11:12:36.2028104Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T11:12:36.2028172Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T11:12:36.2028234Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T11:12:36.2028309Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T11:12:36.2028381Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T11:12:36.2028475Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T11:12:36.2028541Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T11:12:36.2028611Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T11:12:36.2028710Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T11:12:36.2028779Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T11:12:36.2028847Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T11:12:36.2028936Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T11:12:36.2029019Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T11:12:36.2029105Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T11:12:36.2029171Z * [new branch] test-old -> origin/test-old 2025-12-04T11:12:36.2029238Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T11:12:36.2029338Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T11:12:36.2029452Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T11:12:36.2029538Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T11:12:36.2029664Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T11:12:36.2029844Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T11:12:36.2029948Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T11:12:36.2030043Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T11:12:36.2030151Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T11:12:36.2030258Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T11:12:36.2030358Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T11:12:36.2030441Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T11:12:36.2030530Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T11:12:36.2030598Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T11:12:36.2030676Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T11:12:36.2030740Z * [new branch] tmp -> origin/tmp 2025-12-04T11:12:36.2030809Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T11:12:36.2030893Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T11:12:36.2030979Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T11:12:36.2031065Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T11:12:36.2031136Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T11:12:36.2031205Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T11:12:36.2031269Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T11:12:36.2031333Z * [new branch] type_dec -> origin/type_dec 2025-12-04T11:12:36.2031461Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T11:12:36.2031599Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T11:12:36.2031760Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T11:12:36.2031891Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T11:12:36.2032019Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T11:12:36.2032147Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T11:12:36.2032278Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T11:12:36.2032414Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T11:12:36.2032549Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T11:12:36.2032682Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T11:12:36.2032812Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T11:12:36.2032943Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T11:12:36.2033075Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T11:12:36.2033204Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T11:12:36.2033294Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T11:12:36.2033423Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T11:12:36.2033543Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T11:12:36.2033665Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T11:12:36.2033789Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T11:12:36.2033872Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T11:12:36.2033966Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T11:12:36.2034052Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T11:12:36.2034138Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T11:12:36.2034224Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T11:12:36.2034320Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T11:12:36.2034401Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T11:12:36.2034492Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T11:12:36.2034589Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T11:12:36.2034684Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T11:12:36.2034752Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T11:12:36.2034812Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T11:12:36.2034894Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T11:12:36.2034955Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T11:12:36.2035014Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T11:12:36.2035072Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T11:12:36.2035140Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T11:12:36.2035212Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T11:12:36.2035285Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T11:12:36.2035354Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T11:12:36.2035433Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T11:12:36.2035517Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T11:12:36.2035586Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T11:12:36.2035702Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T11:12:36.2035770Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T11:12:36.2035832Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T11:12:36.2035926Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T11:12:36.2035999Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T11:12:36.2036066Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T11:12:36.2036128Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T11:12:36.2036198Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T11:12:36.2036266Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T11:12:36.2036333Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T11:12:36.2036398Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T11:12:36.2036471Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T11:12:36.2036536Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T11:12:36.2036621Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T11:12:36.2036685Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T11:12:36.2036750Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T11:12:36.2036826Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T11:12:36.2036976Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T11:12:36.2037051Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T11:12:36.2037123Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T11:12:36.2037189Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T11:12:36.2037254Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T11:12:36.2037356Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T11:12:36.2037426Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T11:12:36.2037527Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T11:12:36.2037606Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T11:12:36.2037672Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T11:12:36.2037744Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T11:12:36.2037810Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T11:12:36.2037879Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T11:12:36.2037952Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T11:12:36.2038045Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T11:12:36.2038117Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T11:12:36.2038190Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T11:12:36.2038258Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T11:12:36.2038345Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T11:12:36.2038443Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T11:12:36.2038593Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T11:12:36.2038742Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T11:12:36.2038816Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T11:12:36.2038883Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T11:12:36.2038949Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T11:12:36.2039039Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T11:12:36.2039119Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T11:12:36.2039214Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T11:12:36.2039286Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T11:12:36.2039386Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T11:12:36.2039456Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T11:12:36.2039530Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T11:12:36.2039624Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T11:12:36.2039764Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T11:12:36.2039831Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T11:12:36.2039910Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T11:12:36.2039971Z * [new branch] zb2p -> origin/zb2p 2025-12-04T11:12:36.2040060Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T11:12:36.2040150Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T11:12:36.2041185Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T11:12:36.2041268Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T11:12:36.2041417Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T11:12:36.2041516Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T11:12:36.2041604Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T11:12:36.2041695Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T11:12:36.2041823Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T11:12:36.2041923Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T11:12:36.2042013Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T11:12:36.2042110Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T11:12:36.2042229Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T11:12:36.2042308Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T11:12:36.2042412Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T11:12:36.2042492Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T11:12:36.2042569Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T11:12:36.2042645Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T11:12:36.2042729Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T11:12:36.2042795Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T11:12:36.2042864Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T11:12:36.2042959Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T11:12:36.2043040Z t [tag update] ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355 2025-12-04T11:12:36.2043112Z t [tag update] ciflow/inductor/169355 -> ciflow/inductor/169355 2025-12-04T11:12:36.2043180Z t [tag update] ciflow/trunk/169355 -> ciflow/trunk/169355 2025-12-04T11:12:36.4350568Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T11:12:36.4555611Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:36.4561967Z ##[endgroup] 2025-12-04T11:12:36.4562364Z ##[group]Determining the checkout info 2025-12-04T11:12:36.4562782Z ##[endgroup] 2025-12-04T11:12:36.4568558Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T11:12:36.4659770Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T11:12:36.4676771Z ##[group]Checking out the ref 2025-12-04T11:12:36.4678474Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:36.5642483Z Previous HEAD position was c0cb6e784044 [DTensor] ExplicitRedistributionContext warning mode (#169452) 2025-12-04T11:12:36.5647398Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T11:12:36.5757515Z ##[endgroup] 2025-12-04T11:12:36.5757744Z ##[group]Setting up auth for fetching submodules 2025-12-04T11:12:36.5764985Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T11:12:36.5801738Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T11:12:36.5822868Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T11:12:36.5848428Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T11:12:36.5874664Z ##[endgroup] 2025-12-04T11:12:36.5874886Z ##[group]Fetching submodules 2025-12-04T11:12:36.5876686Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T11:12:36.6055060Z Synchronizing submodule url for 'android/libs/fbjni' 2025-12-04T11:12:36.6064717Z Synchronizing submodule url for 'third_party/FP16' 2025-12-04T11:12:36.6076368Z Synchronizing submodule url for 'third_party/FXdiv' 2025-12-04T11:12:36.6090659Z Synchronizing submodule url for 'third_party/NNPACK' 2025-12-04T11:12:36.6101836Z Synchronizing submodule url for 'third_party/NVTX' 2025-12-04T11:12:36.6111387Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:36.6120689Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-12-04T11:12:36.6136266Z Synchronizing submodule url for 'third_party/aiter' 2025-12-04T11:12:36.6157926Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:36.6172629Z Synchronizing submodule url for 'third_party/benchmark' 2025-12-04T11:12:36.6187477Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-12-04T11:12:36.6208064Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-12-04T11:12:36.6220487Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-12-04T11:12:36.6233806Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-12-04T11:12:36.6250259Z Synchronizing submodule url for 'third_party/cutlass' 2025-12-04T11:12:36.6269040Z Synchronizing submodule url for 'third_party/fbgemm' 2025-12-04T11:12:36.6281577Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:36.6292920Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:36.6306148Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:36.6316282Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:36.6338036Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:36.6350196Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:36.6365424Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-12-04T11:12:36.6379312Z Synchronizing submodule url for 'third_party/flash-attention' 2025-12-04T11:12:36.6400578Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:36.6413998Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:36.6429219Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-12-04T11:12:36.6447051Z Synchronizing submodule url for 'third_party/fmt' 2025-12-04T11:12:36.6458570Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:36.6469748Z Synchronizing submodule url for 'third_party/gloo' 2025-12-04T11:12:36.6480068Z Synchronizing submodule url for 'third_party/googletest' 2025-12-04T11:12:36.6494741Z Synchronizing submodule url for 'third_party/ideep' 2025-12-04T11:12:36.6504516Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:36.6517580Z Synchronizing submodule url for 'third_party/ittapi' 2025-12-04T11:12:36.6527348Z Synchronizing submodule url for 'third_party/kineto' 2025-12-04T11:12:36.6538056Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:36.6547772Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:36.6560308Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:36.6573616Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:36.6583344Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:36.6597964Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:36.6610258Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:36.6622627Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:36.6633626Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:36.6645974Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:36.6656447Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:36.6667546Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:36.6678540Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:36.6692545Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:36.6704852Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:36.6717077Z Synchronizing submodule url for 'third_party/kleidiai' 2025-12-04T11:12:36.6732165Z Synchronizing submodule url for 'third_party/mimalloc' 2025-12-04T11:12:36.6748428Z Synchronizing submodule url for 'third_party/nlohmann' 2025-12-04T11:12:36.6763352Z Synchronizing submodule url for 'third_party/onnx' 2025-12-04T11:12:36.6779537Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:36.6794752Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-12-04T11:12:36.6807378Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:36.6818834Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:36.6829262Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:36.6839853Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:36.6854723Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:36.6865418Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:36.6875490Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:36.6891228Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:36.6903811Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:36.6916030Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:36.6935744Z Synchronizing submodule url for 'third_party/pocketfft' 2025-12-04T11:12:36.6945324Z Synchronizing submodule url for 'third_party/protobuf' 2025-12-04T11:12:36.6955848Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:36.6976049Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:36.6997467Z Synchronizing submodule url for 'third_party/psimd' 2025-12-04T11:12:36.7010105Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-12-04T11:12:36.7021145Z Synchronizing submodule url for 'third_party/pybind11' 2025-12-04T11:12:36.7032332Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-12-04T11:12:36.7046168Z Synchronizing submodule url for 'third_party/sleef' 2025-12-04T11:12:36.7057125Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-12-04T11:12:36.7072134Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:36.7086604Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:36.7099994Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:36.7109757Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:36.7122204Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:36.7148058Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T11:12:36.7470937Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T11:12:36.7546127Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T11:12:36.7598373Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T11:12:36.7723494Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T11:12:36.7796999Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T11:12:36.7856941Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T11:12:36.9292755Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T11:12:36.9436503Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T11:12:36.9646890Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T11:12:36.9774209Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T11:12:36.9985400Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T11:12:37.0053369Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T11:12:37.0689348Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T11:12:37.0781693Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T11:12:37.0913284Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T11:12:37.1627078Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T11:12:37.1943779Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T11:12:37.3611624Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T11:12:37.4253127Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T11:12:37.8607444Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T11:12:37.8828062Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:37.8911561Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T11:12:37.9455753Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T11:12:37.9579850Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T11:12:37.9779331Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T11:12:37.9915393Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T11:12:38.0025442Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T11:12:38.0174879Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T11:12:38.0382938Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T11:12:38.0499465Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T11:12:38.0685139Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:38.0782658Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T11:12:38.4584075Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T11:12:38.4690662Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T11:12:38.4785011Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T11:12:38.4892285Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T11:12:38.4987715Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T11:12:38.5066907Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T11:12:38.5153719Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T11:12:38.5216365Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T11:12:38.5284141Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T11:12:38.5372163Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T11:12:38.5446917Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:38.5534386Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T11:12:38.5588176Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T11:12:38.5646399Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T11:12:38.5735251Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T11:12:38.5793736Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T11:12:38.5846316Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T11:12:38.5914231Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:38.5994458Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T11:12:38.6087929Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T11:12:38.6206174Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T11:12:38.7910573Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T11:12:38.8113342Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T11:12:38.8242620Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T11:12:38.8336853Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T11:12:38.8406864Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T11:12:38.8469413Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T11:12:38.8565776Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T11:12:38.8619505Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T11:12:38.8663853Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T11:12:38.8731075Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T11:12:38.8809311Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T11:12:38.8868536Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T11:12:38.9009127Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T11:12:38.9093005Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T11:12:39.0389214Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T11:12:39.0485267Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T11:12:39.0705641Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T11:12:39.0773196Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T11:12:39.0858118Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T11:12:39.1052935Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T11:12:39.1282175Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T11:12:39.1537812Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T11:12:39.1652311Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T11:12:39.1861654Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T11:12:39.1954364Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T11:12:39.2246742Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T11:12:39.2400741Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T11:12:39.2460821Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T11:12:39.2502206Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T11:12:39.2725414Z Entering 'android/libs/fbjni' 2025-12-04T11:12:39.2758818Z Entering 'third_party/FP16' 2025-12-04T11:12:39.2784301Z Entering 'third_party/FXdiv' 2025-12-04T11:12:39.2806917Z Entering 'third_party/NNPACK' 2025-12-04T11:12:39.2831115Z Entering 'third_party/NVTX' 2025-12-04T11:12:39.2858876Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:39.2877932Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:39.2909560Z Entering 'third_party/aiter' 2025-12-04T11:12:39.2929735Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:39.2958554Z Entering 'third_party/benchmark' 2025-12-04T11:12:39.2978748Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:39.3002352Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:39.3024240Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:39.3045038Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:39.3064903Z Entering 'third_party/cutlass' 2025-12-04T11:12:39.3091220Z Entering 'third_party/fbgemm' 2025-12-04T11:12:39.3110726Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:39.3130928Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:39.3153850Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:39.3176220Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:39.3199922Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:39.3228464Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:39.3249830Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:39.3282742Z Entering 'third_party/flash-attention' 2025-12-04T11:12:39.3303886Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:39.3340747Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:39.3366911Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:39.3404599Z Entering 'third_party/fmt' 2025-12-04T11:12:39.3430900Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:39.3452103Z Entering 'third_party/gloo' 2025-12-04T11:12:39.3472211Z Entering 'third_party/googletest' 2025-12-04T11:12:39.3493940Z Entering 'third_party/ideep' 2025-12-04T11:12:39.3515025Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:39.3538949Z Entering 'third_party/ittapi' 2025-12-04T11:12:39.3569607Z Entering 'third_party/kineto' 2025-12-04T11:12:39.3593804Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:39.3615920Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:39.3636690Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:39.3668283Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:39.3691581Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:39.3720074Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:39.3741301Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:39.3766533Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:39.3787995Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:39.3813176Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:39.3842697Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:39.3871270Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:39.3891311Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:39.3920486Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:39.3942632Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:39.3972830Z Entering 'third_party/kleidiai' 2025-12-04T11:12:39.4001859Z Entering 'third_party/mimalloc' 2025-12-04T11:12:39.4025264Z Entering 'third_party/nlohmann' 2025-12-04T11:12:39.4051228Z Entering 'third_party/onnx' 2025-12-04T11:12:39.4084362Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:39.4109843Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:39.4141524Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:39.4161965Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:39.4182163Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:39.4211579Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:39.4234401Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:39.4255010Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:39.4282279Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:39.4305982Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:39.4336354Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:39.4358945Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:39.4389655Z Entering 'third_party/pocketfft' 2025-12-04T11:12:39.4416569Z Entering 'third_party/protobuf' 2025-12-04T11:12:39.4442924Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:39.4463311Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:39.4490046Z Entering 'third_party/psimd' 2025-12-04T11:12:39.4512494Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:39.4535578Z Entering 'third_party/pybind11' 2025-12-04T11:12:39.4557401Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:39.4583340Z Entering 'third_party/sleef' 2025-12-04T11:12:39.4609296Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:39.4639604Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:39.4670510Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:39.4694902Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:39.4721323Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:39.4744242Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:39.4781000Z ##[endgroup] 2025-12-04T11:12:39.4781192Z ##[group]Persisting credentials for submodules 2025-12-04T11:12:39.4789159Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T11:12:39.4974031Z Entering 'android/libs/fbjni' 2025-12-04T11:12:39.4994968Z Entering 'third_party/FP16' 2025-12-04T11:12:39.5024194Z Entering 'third_party/FXdiv' 2025-12-04T11:12:39.5047542Z Entering 'third_party/NNPACK' 2025-12-04T11:12:39.5076091Z Entering 'third_party/NVTX' 2025-12-04T11:12:39.5103048Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:39.5127904Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:39.5164528Z Entering 'third_party/aiter' 2025-12-04T11:12:39.5194791Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:39.5220352Z Entering 'third_party/benchmark' 2025-12-04T11:12:39.5245310Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:39.5280107Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:39.5307916Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:39.5337719Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:39.5362580Z Entering 'third_party/cutlass' 2025-12-04T11:12:39.5403382Z Entering 'third_party/fbgemm' 2025-12-04T11:12:39.5430882Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:39.5458665Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:39.5485806Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:39.5510632Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:39.5537190Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:39.5561337Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:39.5584144Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:39.5608506Z Entering 'third_party/flash-attention' 2025-12-04T11:12:39.5638712Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:39.5671530Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:39.5706270Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:39.5732975Z Entering 'third_party/fmt' 2025-12-04T11:12:39.5761156Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:39.5790510Z Entering 'third_party/gloo' 2025-12-04T11:12:39.5813500Z Entering 'third_party/googletest' 2025-12-04T11:12:39.5835593Z Entering 'third_party/ideep' 2025-12-04T11:12:39.5858110Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:39.5887753Z Entering 'third_party/ittapi' 2025-12-04T11:12:39.5909233Z Entering 'third_party/kineto' 2025-12-04T11:12:39.5931050Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:39.5963770Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:39.5997150Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:39.6026859Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:39.6054319Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:39.6081300Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:39.6111984Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:39.6140597Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:39.6168807Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:39.6195668Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:39.6221713Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:39.6245026Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:39.6271974Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:39.6306651Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:39.6338994Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:39.6370512Z Entering 'third_party/kleidiai' 2025-12-04T11:12:39.6393752Z Entering 'third_party/mimalloc' 2025-12-04T11:12:39.6416914Z Entering 'third_party/nlohmann' 2025-12-04T11:12:39.6439975Z Entering 'third_party/onnx' 2025-12-04T11:12:39.6474375Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:39.6501448Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:39.6525269Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:39.6552278Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:39.6572571Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:39.6597626Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:39.6619405Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:39.6639035Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:39.6659078Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:39.6681810Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:39.6706021Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:39.6751603Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:39.6802089Z Entering 'third_party/pocketfft' 2025-12-04T11:12:39.6834513Z Entering 'third_party/protobuf' 2025-12-04T11:12:39.6860690Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:39.6907871Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:39.6936746Z Entering 'third_party/psimd' 2025-12-04T11:12:39.6963558Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:39.6989813Z Entering 'third_party/pybind11' 2025-12-04T11:12:39.7013490Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:39.7033372Z Entering 'third_party/sleef' 2025-12-04T11:12:39.7056006Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:39.7077782Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:39.7109929Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:39.7133266Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:39.7156158Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:39.7188359Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:39.7231733Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T11:12:39.7405997Z Entering 'android/libs/fbjni' 2025-12-04T11:12:39.7428857Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T11:12:39.7439601Z Entering 'third_party/FP16' 2025-12-04T11:12:39.7460676Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T11:12:39.7470602Z Entering 'third_party/FXdiv' 2025-12-04T11:12:39.7493440Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T11:12:39.7506280Z Entering 'third_party/NNPACK' 2025-12-04T11:12:39.7533251Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T11:12:39.7542647Z Entering 'third_party/NVTX' 2025-12-04T11:12:39.7563322Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T11:12:39.7579472Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:39.7604166Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T11:12:39.7613196Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:39.7637591Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T11:12:39.7655103Z Entering 'third_party/aiter' 2025-12-04T11:12:39.7673259Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T11:12:39.7685871Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:39.7721822Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T11:12:39.7738484Z Entering 'third_party/benchmark' 2025-12-04T11:12:39.7756815Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:39.7771425Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:39.7793072Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T11:12:39.7806778Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:39.7828906Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T11:12:39.7838695Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:39.7868602Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T11:12:39.7879774Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:39.7900473Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T11:12:39.7910548Z Entering 'third_party/cutlass' 2025-12-04T11:12:39.7931492Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T11:12:39.7946655Z Entering 'third_party/fbgemm' 2025-12-04T11:12:39.7967445Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T11:12:39.7979012Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:39.8005096Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T11:12:39.8013564Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:39.8035947Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T11:12:39.8051332Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:39.8070710Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T11:12:39.8087322Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:39.8108971Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T11:12:39.8121325Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:39.8138668Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T11:12:39.8149778Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:39.8168245Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T11:12:39.8176468Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:39.8193525Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T11:12:39.8204994Z Entering 'third_party/flash-attention' 2025-12-04T11:12:39.8233949Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T11:12:39.8245951Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:39.8268040Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T11:12:39.8280176Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:39.8298074Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T11:12:39.8312986Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:39.8336684Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T11:12:39.8346563Z Entering 'third_party/fmt' 2025-12-04T11:12:39.8372930Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:39.8383996Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:39.8406083Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T11:12:39.8415586Z Entering 'third_party/gloo' 2025-12-04T11:12:39.8436052Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T11:12:39.8444878Z Entering 'third_party/googletest' 2025-12-04T11:12:39.8465666Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:39.8479798Z Entering 'third_party/ideep' 2025-12-04T11:12:39.8498352Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T11:12:39.8507928Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:39.8544230Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T11:12:39.8559436Z Entering 'third_party/ittapi' 2025-12-04T11:12:39.8577628Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T11:12:39.8591812Z Entering 'third_party/kineto' 2025-12-04T11:12:39.8613832Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T11:12:39.8623302Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:39.8642111Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T11:12:39.8651152Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:39.8672534Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T11:12:39.8682549Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:39.8702735Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T11:12:39.8717367Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:39.8737122Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:39.8745549Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:39.8764874Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T11:12:39.8773313Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:39.8811681Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T11:12:39.8823512Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:39.8842510Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T11:12:39.8851368Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:39.8869509Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:39.8881688Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:39.8903859Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T11:12:39.8913171Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:39.8930764Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T11:12:39.8939241Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:39.8959329Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:39.8973567Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:39.8994103Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:39.9004171Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:39.9025359Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:39.9037222Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:39.9064417Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T11:12:39.9074397Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:39.9096040Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T11:12:39.9112129Z Entering 'third_party/kleidiai' 2025-12-04T11:12:39.9131146Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T11:12:39.9141037Z Entering 'third_party/mimalloc' 2025-12-04T11:12:39.9161611Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T11:12:39.9171508Z Entering 'third_party/nlohmann' 2025-12-04T11:12:39.9190060Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T11:12:39.9200034Z Entering 'third_party/onnx' 2025-12-04T11:12:39.9216951Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T11:12:39.9231641Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:39.9256997Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:39.9269674Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:39.9294837Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T11:12:39.9304854Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:39.9342081Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:39.9355762Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:39.9386991Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:39.9397368Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:39.9418029Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T11:12:39.9427342Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:39.9447573Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T11:12:39.9456721Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:39.9476499Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T11:12:39.9485200Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:39.9506567Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T11:12:39.9520334Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:39.9544181Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:39.9553073Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:39.9590570Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:39.9602372Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:39.9624798Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:39.9639857Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:39.9662806Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T11:12:39.9687331Z Entering 'third_party/pocketfft' 2025-12-04T11:12:39.9708970Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T11:12:39.9718638Z Entering 'third_party/protobuf' 2025-12-04T11:12:39.9741067Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T11:12:39.9752216Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:39.9787601Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:39.9799450Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:39.9822743Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:39.9835013Z Entering 'third_party/psimd' 2025-12-04T11:12:39.9858696Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T11:12:39.9869808Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:39.9893224Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T11:12:39.9907055Z Entering 'third_party/pybind11' 2025-12-04T11:12:39.9930580Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:39.9942922Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:39.9962601Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T11:12:39.9974347Z Entering 'third_party/sleef' 2025-12-04T11:12:39.9994255Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T11:12:40.0003369Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:40.0027554Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T11:12:40.0036211Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:40.0060023Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:40.0071185Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:40.0093048Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T11:12:40.0103250Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:40.0129989Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T11:12:40.0140862Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:40.0161076Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:40.0171508Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:40.0198332Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T11:12:40.0418927Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T11:12:40.0603258Z Entering 'android/libs/fbjni' 2025-12-04T11:12:40.0632043Z Entering 'third_party/FP16' 2025-12-04T11:12:40.0659416Z Entering 'third_party/FXdiv' 2025-12-04T11:12:40.0683055Z Entering 'third_party/NNPACK' 2025-12-04T11:12:40.0713720Z Entering 'third_party/NVTX' 2025-12-04T11:12:40.0742220Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:40.0772710Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:40.0808361Z Entering 'third_party/aiter' 2025-12-04T11:12:40.0863686Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:40.0872026Z Entering 'third_party/benchmark' 2025-12-04T11:12:40.0899267Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:40.0923261Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:40.0946109Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:40.0970954Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:40.0997649Z Entering 'third_party/cutlass' 2025-12-04T11:12:40.1027067Z Entering 'third_party/fbgemm' 2025-12-04T11:12:40.1052934Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:40.1080710Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:40.1105386Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:40.1137763Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:40.1164759Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:40.1191477Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:40.1218384Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:40.1245678Z Entering 'third_party/flash-attention' 2025-12-04T11:12:40.1272670Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:40.1316281Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:40.1350968Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:40.1374471Z Entering 'third_party/fmt' 2025-12-04T11:12:40.1398262Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:40.1421802Z Entering 'third_party/gloo' 2025-12-04T11:12:40.1443953Z Entering 'third_party/googletest' 2025-12-04T11:12:40.1469023Z Entering 'third_party/ideep' 2025-12-04T11:12:40.1492582Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:40.1516983Z Entering 'third_party/ittapi' 2025-12-04T11:12:40.1541506Z Entering 'third_party/kineto' 2025-12-04T11:12:40.1569842Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:40.1598455Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:40.1626494Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:40.1653804Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:40.1686234Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:40.1710677Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:40.1733512Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:40.1753338Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:40.1777503Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:40.1800897Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:40.1825005Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:40.1847150Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:40.1869558Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:40.1900047Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:40.1920031Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:40.1942374Z Entering 'third_party/kleidiai' 2025-12-04T11:12:40.1967880Z Entering 'third_party/mimalloc' 2025-12-04T11:12:40.1991257Z Entering 'third_party/nlohmann' 2025-12-04T11:12:40.2018862Z Entering 'third_party/onnx' 2025-12-04T11:12:40.2051047Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:40.2080369Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:40.2110532Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:40.2132718Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:40.2152153Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:40.2174044Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:40.2203730Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:40.2224664Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:40.2243345Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:40.2279925Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:40.2311880Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:40.2351055Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:40.2390497Z Entering 'third_party/pocketfft' 2025-12-04T11:12:40.2419066Z Entering 'third_party/protobuf' 2025-12-04T11:12:40.2447769Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:40.2474899Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:40.2509044Z Entering 'third_party/psimd' 2025-12-04T11:12:40.2537559Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:40.2564993Z Entering 'third_party/pybind11' 2025-12-04T11:12:40.2591768Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:40.2617372Z Entering 'third_party/sleef' 2025-12-04T11:12:40.2640058Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:40.2666351Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:40.2685983Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:40.2705726Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:40.2725917Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:40.2751319Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:40.2789593Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T11:12:40.2981469Z Entering 'android/libs/fbjni' 2025-12-04T11:12:40.3006329Z Entering 'third_party/FP16' 2025-12-04T11:12:40.3027604Z Entering 'third_party/FXdiv' 2025-12-04T11:12:40.3046878Z Entering 'third_party/NNPACK' 2025-12-04T11:12:40.3068894Z Entering 'third_party/NVTX' 2025-12-04T11:12:40.3095253Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:40.3117466Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:40.3143907Z Entering 'third_party/aiter' 2025-12-04T11:12:40.3165850Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:40.3193837Z Entering 'third_party/benchmark' 2025-12-04T11:12:40.3213768Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:40.3236823Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:40.3262647Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:40.3290345Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:40.3312840Z Entering 'third_party/cutlass' 2025-12-04T11:12:40.3338876Z Entering 'third_party/fbgemm' 2025-12-04T11:12:40.3365141Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:40.3390720Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:40.3419940Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:40.3444035Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:40.3478544Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:40.3505068Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:40.3532392Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:40.3556269Z Entering 'third_party/flash-attention' 2025-12-04T11:12:40.3578779Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:40.3601201Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:40.3634869Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:40.3656675Z Entering 'third_party/fmt' 2025-12-04T11:12:40.3678300Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:40.3709222Z Entering 'third_party/gloo' 2025-12-04T11:12:40.3736097Z Entering 'third_party/googletest' 2025-12-04T11:12:40.3759939Z Entering 'third_party/ideep' 2025-12-04T11:12:40.3784390Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:40.3812487Z Entering 'third_party/ittapi' 2025-12-04T11:12:40.3846366Z Entering 'third_party/kineto' 2025-12-04T11:12:40.3867452Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:40.3886311Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:40.3907733Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:40.3929350Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:40.3952243Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:40.3982452Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:40.4009987Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:40.4042233Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:40.4066041Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:40.4090783Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:40.4117019Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:40.4140105Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:40.4173142Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:40.4201996Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:40.4225303Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:40.4252085Z Entering 'third_party/kleidiai' 2025-12-04T11:12:40.4274610Z Entering 'third_party/mimalloc' 2025-12-04T11:12:40.4297132Z Entering 'third_party/nlohmann' 2025-12-04T11:12:40.4319053Z Entering 'third_party/onnx' 2025-12-04T11:12:40.4345912Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:40.4381641Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:40.4405614Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:40.4430157Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:40.4453955Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:40.4481185Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:40.4508802Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:40.4542994Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:40.4582444Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:40.4606723Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:40.4630899Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:40.4656600Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:40.4705164Z Entering 'third_party/pocketfft' 2025-12-04T11:12:40.4739038Z Entering 'third_party/protobuf' 2025-12-04T11:12:40.4765759Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:40.4786911Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:40.4812662Z Entering 'third_party/psimd' 2025-12-04T11:12:40.4838519Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:40.4863052Z Entering 'third_party/pybind11' 2025-12-04T11:12:40.4883550Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:40.4903871Z Entering 'third_party/sleef' 2025-12-04T11:12:40.4926690Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:40.4950967Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:40.4974839Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:40.4999436Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:40.5026812Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:40.5050456Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:40.5091355Z ##[endgroup] 2025-12-04T11:12:40.5873615Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T11:12:40.6004630Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:40.6150010Z ##[group]Run actions/checkout@v4 2025-12-04T11:12:40.6150149Z with: 2025-12-04T11:12:40.6150259Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:40.6150395Z fetch-depth: 0 2025-12-04T11:12:40.6150494Z submodules: recursive 2025-12-04T11:12:40.6150601Z show-progress: false 2025-12-04T11:12:40.6150716Z repository: pytorch/pytorch 2025-12-04T11:12:40.6150868Z token: *** 2025-12-04T11:12:40.6150961Z ssh-strict: true 2025-12-04T11:12:40.6151070Z ssh-user: git 2025-12-04T11:12:40.6151169Z persist-credentials: true 2025-12-04T11:12:40.6151280Z clean: true 2025-12-04T11:12:40.6151381Z sparse-checkout-cone-mode: true 2025-12-04T11:12:40.6151511Z fetch-tags: false 2025-12-04T11:12:40.6151608Z lfs: false 2025-12-04T11:12:40.6151699Z set-safe-directory: true 2025-12-04T11:12:40.6151801Z env: 2025-12-04T11:12:40.6151891Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:40.6151991Z ##[endgroup] 2025-12-04T11:12:40.6632913Z Syncing repository: pytorch/pytorch 2025-12-04T11:12:40.6633250Z ##[group]Getting Git version info 2025-12-04T11:12:40.6633473Z Working directory is '/home/runner/_work/pytorch/pytorch' 2025-12-04T11:12:40.6646390Z [command]/usr/bin/git version 2025-12-04T11:12:40.6674867Z git version 2.52.0 2025-12-04T11:12:40.6696640Z ##[endgroup] 2025-12-04T11:12:40.6702934Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/25f07e54-3b35-46fb-b826-eaa3c807e705/.gitconfig' 2025-12-04T11:12:40.6708941Z Temporarily overriding HOME='/home/runner/_work/_temp/25f07e54-3b35-46fb-b826-eaa3c807e705' before making global git config changes 2025-12-04T11:12:40.6709379Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T11:12:40.6719168Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T11:12:40.6755652Z [command]/usr/bin/git config --local --get remote.origin.url 2025-12-04T11:12:40.6787455Z https://github.com/pytorch/pytorch 2025-12-04T11:12:40.6807724Z ##[group]Removing previously created refs, to avoid conflicts 2025-12-04T11:12:40.6811835Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-12-04T11:12:40.6833243Z HEAD 2025-12-04T11:12:40.6870966Z ##[endgroup] 2025-12-04T11:12:40.6873285Z [command]/usr/bin/git submodule status 2025-12-04T11:12:40.7141625Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-12-04T11:12:40.7196254Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-12-04T11:12:40.7255262Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-12-04T11:12:40.7323297Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-12-04T11:12:40.7370100Z 3ebbc93ded7285963bff932c678fa367eb393ba6 third_party/NVTX (v3.1.0-313-g3ebbc93) 2025-12-04T11:12:40.7421626Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-12-04T11:12:40.7730860Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-12-04T11:12:40.7763221Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-12-04T11:12:40.7786570Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-12-04T11:12:40.7851727Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-12-04T11:12:40.7953466Z 89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0) 2025-12-04T11:12:40.8027974Z f858c30bcb16f8effd5ff46996f0514539e17abc third_party/cpuinfo (f858c30) 2025-12-04T11:12:40.8053272Z 0b1577c8c83401237d601d0d0db5210506705396 third_party/cudnn_frontend (v0.5-61-g0b1577c) 2025-12-04T11:12:40.8126109Z f88806b1e31dfa579842638740216dd41fc6c588 third_party/cutlass (v4.3.1) 2025-12-04T11:12:40.8145788Z c0b988d39a9e47c794d699f29930ed4d7c7e13a4 third_party/fbgemm (v1.4.0-rc1-2-gc0b988d39) 2025-12-04T11:12:40.8211837Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-12-04T11:12:40.8230904Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-12-04T11:12:40.8463072Z 407c905e45ad75fc29bf0f9bb7c5c2fd3475976f third_party/fmt (12.1.0) 2025-12-04T11:12:40.8544004Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-12-04T11:12:40.8628133Z 54cbae0d3a67fa890b4c3d9ee162b7860315e341 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-37-g54cbae0) 2025-12-04T11:12:40.8780290Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-12-04T11:12:40.8851621Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-12-04T11:12:40.8903494Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-12-04T11:12:40.9043750Z 31f85df8fbd89c188f14ef10f1ec65379786b943 third_party/kineto (heads/main) 2025-12-04T11:12:40.9060680Z d7770c89632329a9914ef1a90289917597639cbe third_party/kleidiai (v1.15.0) 2025-12-04T11:12:40.9078786Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-12-04T11:12:40.9093940Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-12-04T11:12:40.9309842Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-12-04T11:12:40.9329356Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-12-04T11:12:40.9358773Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-12-04T11:12:40.9572384Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-12-04T11:12:40.9621969Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-12-04T11:12:40.9660350Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-12-04T11:12:40.9681048Z f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1) 2025-12-04T11:12:40.9750057Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-12-04T11:12:40.9806736Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-12-04T11:12:40.9858167Z 2b4cd91092d335a697416b2a3cb398283246849d third_party/tensorpipe (heads/main) 2025-12-04T11:12:40.9870665Z ##[group]Cleaning the repository 2025-12-04T11:12:40.9877088Z [command]/usr/bin/git clean -ffdx 2025-12-04T11:12:41.0001605Z [command]/usr/bin/git reset --hard HEAD 2025-12-04T11:12:41.0895405Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T11:12:41.0978695Z ##[endgroup] 2025-12-04T11:12:41.0982187Z ##[group]Disabling automatic garbage collection 2025-12-04T11:12:41.0988478Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T11:12:41.1014545Z ##[endgroup] 2025-12-04T11:12:41.1014776Z ##[group]Setting up auth 2025-12-04T11:12:41.1018840Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T11:12:41.1045782Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T11:12:41.1274676Z Entering 'android/libs/fbjni' 2025-12-04T11:12:41.1307535Z Entering 'third_party/FP16' 2025-12-04T11:12:41.1335992Z Entering 'third_party/FXdiv' 2025-12-04T11:12:41.1357281Z Entering 'third_party/NNPACK' 2025-12-04T11:12:41.1378067Z Entering 'third_party/NVTX' 2025-12-04T11:12:41.1408732Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:41.1439393Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:41.1480183Z Entering 'third_party/aiter' 2025-12-04T11:12:41.1512717Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:41.1560448Z Entering 'third_party/benchmark' 2025-12-04T11:12:41.1587820Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:41.1623259Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:41.1653215Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:41.1683452Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:41.1713077Z Entering 'third_party/cutlass' 2025-12-04T11:12:41.1740143Z Entering 'third_party/fbgemm' 2025-12-04T11:12:41.1762733Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:41.1791414Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:41.1826202Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:41.1850272Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:41.1882110Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:41.1917132Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:41.1939257Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:41.1978388Z Entering 'third_party/flash-attention' 2025-12-04T11:12:41.2006581Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:41.2035749Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:41.2066578Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:41.2090429Z Entering 'third_party/fmt' 2025-12-04T11:12:41.2116485Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:41.2145033Z Entering 'third_party/gloo' 2025-12-04T11:12:41.2166065Z Entering 'third_party/googletest' 2025-12-04T11:12:41.2189288Z Entering 'third_party/ideep' 2025-12-04T11:12:41.2211077Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:41.2246464Z Entering 'third_party/ittapi' 2025-12-04T11:12:41.2273440Z Entering 'third_party/kineto' 2025-12-04T11:12:41.2300549Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:41.2331040Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:41.2356093Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:41.2381094Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:41.2403986Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:41.2425559Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:41.2456737Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:41.2485248Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:41.2508160Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:41.2529360Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:41.2548729Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:41.2570097Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:41.2597442Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:41.2624374Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:41.2649415Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:41.2677295Z Entering 'third_party/kleidiai' 2025-12-04T11:12:41.2700820Z Entering 'third_party/mimalloc' 2025-12-04T11:12:41.2723307Z Entering 'third_party/nlohmann' 2025-12-04T11:12:41.2744088Z Entering 'third_party/onnx' 2025-12-04T11:12:41.2775896Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:41.2806713Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:41.2830378Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:41.2854753Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:41.2879457Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:41.2910471Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:41.2947537Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:41.2976790Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:41.3003489Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:41.3026117Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:41.3062105Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:41.3096866Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:41.3148944Z Entering 'third_party/pocketfft' 2025-12-04T11:12:41.3181135Z Entering 'third_party/protobuf' 2025-12-04T11:12:41.3205039Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:41.3235448Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:41.3267204Z Entering 'third_party/psimd' 2025-12-04T11:12:41.3291567Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:41.3313145Z Entering 'third_party/pybind11' 2025-12-04T11:12:41.3337062Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:41.3360318Z Entering 'third_party/sleef' 2025-12-04T11:12:41.3383304Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:41.3409760Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:41.3431217Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:41.3452518Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:41.3482893Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:41.3504134Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:41.3556030Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T11:12:41.3574224Z http.https://github.com/.extraheader 2025-12-04T11:12:41.3582485Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T11:12:41.3606810Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T11:12:41.3822875Z Entering 'android/libs/fbjni' 2025-12-04T11:12:41.3845160Z http.https://github.com/.extraheader 2025-12-04T11:12:41.3865092Z Entering 'third_party/FP16' 2025-12-04T11:12:41.3883014Z http.https://github.com/.extraheader 2025-12-04T11:12:41.3911263Z Entering 'third_party/FXdiv' 2025-12-04T11:12:41.3932641Z http.https://github.com/.extraheader 2025-12-04T11:12:41.3958262Z Entering 'third_party/NNPACK' 2025-12-04T11:12:41.3975734Z http.https://github.com/.extraheader 2025-12-04T11:12:41.3999084Z Entering 'third_party/NVTX' 2025-12-04T11:12:41.4020549Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4042135Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:41.4056766Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4077787Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:41.4092461Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4121679Z Entering 'third_party/aiter' 2025-12-04T11:12:41.4139071Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4162035Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:41.4178574Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4213097Z Entering 'third_party/benchmark' 2025-12-04T11:12:41.4232513Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4255678Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:41.4271009Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4298154Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:41.4313807Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4339205Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:41.4359497Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4391395Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:41.4415814Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4439368Z Entering 'third_party/cutlass' 2025-12-04T11:12:41.4460651Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4489363Z Entering 'third_party/fbgemm' 2025-12-04T11:12:41.4515267Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4542088Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:41.4567170Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4588648Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:41.4603541Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4627645Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:41.4641958Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4660005Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:41.4680405Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4702124Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:41.4719464Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4738861Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:41.4756139Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4777743Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:41.4792137Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4814523Z Entering 'third_party/flash-attention' 2025-12-04T11:12:41.4834813Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4857263Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:41.4874659Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4903250Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:41.4920766Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4943463Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:41.4961194Z http.https://github.com/.extraheader 2025-12-04T11:12:41.4981373Z Entering 'third_party/fmt' 2025-12-04T11:12:41.5005201Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5024732Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:41.5040801Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5066418Z Entering 'third_party/gloo' 2025-12-04T11:12:41.5082158Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5102101Z Entering 'third_party/googletest' 2025-12-04T11:12:41.5118813Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5138997Z Entering 'third_party/ideep' 2025-12-04T11:12:41.5155539Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5174318Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:41.5187853Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5216793Z Entering 'third_party/ittapi' 2025-12-04T11:12:41.5233057Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5249551Z Entering 'third_party/kineto' 2025-12-04T11:12:41.5273284Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5289993Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:41.5306477Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5323236Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:41.5336888Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5357201Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:41.5377923Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5398069Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:41.5414068Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5433224Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:41.5446711Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5470371Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:41.5484778Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5510052Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:41.5525523Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5542742Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:41.5558965Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5580960Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:41.5598223Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5619781Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:41.5633386Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5652531Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:41.5669091Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5689075Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:41.5705504Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5727677Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:41.5741404Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5765623Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:41.5780190Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5801031Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:41.5818586Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5839101Z Entering 'third_party/kleidiai' 2025-12-04T11:12:41.5862390Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5886713Z Entering 'third_party/mimalloc' 2025-12-04T11:12:41.5901634Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5918558Z Entering 'third_party/nlohmann' 2025-12-04T11:12:41.5932890Z http.https://github.com/.extraheader 2025-12-04T11:12:41.5957327Z Entering 'third_party/onnx' 2025-12-04T11:12:41.5973556Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6003280Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:41.6023078Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6047912Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:41.6064933Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6081554Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:41.6099639Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6116875Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:41.6131713Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6148161Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:41.6161562Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6182609Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:41.6195450Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6214694Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:41.6231925Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6251115Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:41.6270093Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6289205Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:41.6304687Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6324264Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:41.6340661Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6358189Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:41.6376622Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6401542Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:41.6421866Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6450068Z Entering 'third_party/pocketfft' 2025-12-04T11:12:41.6466064Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6494154Z Entering 'third_party/protobuf' 2025-12-04T11:12:41.6509780Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6529345Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:41.6547061Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6567482Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:41.6579939Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6598742Z Entering 'third_party/psimd' 2025-12-04T11:12:41.6612875Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6628633Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:41.6641339Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6663647Z Entering 'third_party/pybind11' 2025-12-04T11:12:41.6678089Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6698293Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:41.6714084Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6731750Z Entering 'third_party/sleef' 2025-12-04T11:12:41.6745258Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6765322Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:41.6782643Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6800545Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:41.6817260Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6838852Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:41.6854311Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6876198Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:41.6895492Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6917625Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:41.6934647Z http.https://github.com/.extraheader 2025-12-04T11:12:41.6960365Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:41.6983415Z http.https://github.com/.extraheader 2025-12-04T11:12:41.7026512Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.7053259Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T11:12:41.7225416Z Entering 'android/libs/fbjni' 2025-12-04T11:12:41.7236640Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T11:12:41.7246256Z Entering 'third_party/FP16' 2025-12-04T11:12:41.7256373Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T11:12:41.7265059Z Entering 'third_party/FXdiv' 2025-12-04T11:12:41.7274307Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T11:12:41.7282598Z Entering 'third_party/NNPACK' 2025-12-04T11:12:41.7292618Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T11:12:41.7300850Z Entering 'third_party/NVTX' 2025-12-04T11:12:41.7311969Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T11:12:41.7323098Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:41.7335258Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T11:12:41.7343990Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:41.7358012Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T11:12:41.7374916Z Entering 'third_party/aiter' 2025-12-04T11:12:41.7384569Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T11:12:41.7395578Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:41.7404909Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T11:12:41.7420067Z Entering 'third_party/benchmark' 2025-12-04T11:12:41.7429528Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:41.7438258Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:41.7448426Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T11:12:41.7461599Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:41.7471084Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T11:12:41.7479527Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:41.7489324Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T11:12:41.7500874Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:41.7510619Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T11:12:41.7519299Z Entering 'third_party/cutlass' 2025-12-04T11:12:41.7529220Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T11:12:41.7546974Z Entering 'third_party/fbgemm' 2025-12-04T11:12:41.7556707Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T11:12:41.7567723Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:41.7576614Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T11:12:41.7585457Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:41.7595594Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T11:12:41.7608783Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:41.7623261Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T11:12:41.7632157Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:41.7647355Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T11:12:41.7660471Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:41.7669785Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T11:12:41.7683953Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:41.7693515Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T11:12:41.7701480Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:41.7711784Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T11:12:41.7725805Z Entering 'third_party/flash-attention' 2025-12-04T11:12:41.7735042Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T11:12:41.7743737Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:41.7753687Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T11:12:41.7764115Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:41.7774960Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T11:12:41.7787941Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:41.7799770Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T11:12:41.7809642Z Entering 'third_party/fmt' 2025-12-04T11:12:41.7819917Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:41.7829624Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:41.7839654Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T11:12:41.7848163Z Entering 'third_party/gloo' 2025-12-04T11:12:41.7858623Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T11:12:41.7867302Z Entering 'third_party/googletest' 2025-12-04T11:12:41.7880557Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:41.7893998Z Entering 'third_party/ideep' 2025-12-04T11:12:41.7909604Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T11:12:41.7917123Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:41.7925927Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T11:12:41.7941589Z Entering 'third_party/ittapi' 2025-12-04T11:12:41.7952023Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T11:12:41.7967695Z Entering 'third_party/kineto' 2025-12-04T11:12:41.7978185Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T11:12:41.7987092Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:41.7996762Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T11:12:41.8012507Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:41.8023678Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T11:12:41.8032307Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:41.8041269Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T11:12:41.8049213Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:41.8061395Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:41.8076215Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:41.8085997Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T11:12:41.8096319Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:41.8109130Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T11:12:41.8120774Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:41.8131472Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T11:12:41.8141423Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:41.8152370Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:41.8161499Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:41.8173614Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T11:12:41.8183907Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:41.8193316Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T11:12:41.8202197Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:41.8211422Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:41.8220247Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:41.8229516Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:41.8239115Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:41.8256974Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:41.8271122Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:41.8279948Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T11:12:41.8289578Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:41.8300179Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T11:12:41.8313371Z Entering 'third_party/kleidiai' 2025-12-04T11:12:41.8322381Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T11:12:41.8331807Z Entering 'third_party/mimalloc' 2025-12-04T11:12:41.8341362Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T11:12:41.8349939Z Entering 'third_party/nlohmann' 2025-12-04T11:12:41.8359517Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T11:12:41.8368406Z Entering 'third_party/onnx' 2025-12-04T11:12:41.8379199Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T11:12:41.8397128Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:41.8406562Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:41.8417759Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:41.8427525Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T11:12:41.8437256Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:41.8445979Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:41.8454734Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:41.8463687Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:41.8474368Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:41.8484134Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T11:12:41.8493904Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:41.8503546Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T11:12:41.8512535Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:41.8521629Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T11:12:41.8529877Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:41.8539012Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T11:12:41.8547067Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:41.8557052Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:41.8564800Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:41.8576181Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:41.8585185Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:41.8600804Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:41.8611543Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:41.8623249Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T11:12:41.8639961Z Entering 'third_party/pocketfft' 2025-12-04T11:12:41.8651132Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T11:12:41.8659606Z Entering 'third_party/protobuf' 2025-12-04T11:12:41.8671104Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T11:12:41.8681113Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:41.8706920Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:41.8715543Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:41.8725175Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:41.8735983Z Entering 'third_party/psimd' 2025-12-04T11:12:41.8744900Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T11:12:41.8753042Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:41.8765006Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T11:12:41.8773607Z Entering 'third_party/pybind11' 2025-12-04T11:12:41.8786117Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:41.8794982Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:41.8807687Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T11:12:41.8816258Z Entering 'third_party/sleef' 2025-12-04T11:12:41.8825612Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T11:12:41.8835439Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:41.8845966Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T11:12:41.8854597Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:41.8864002Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:41.8872251Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:41.8883961Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T11:12:41.8892090Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:41.8902633Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T11:12:41.8917217Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:41.8926570Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:41.8933419Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:41.8945032Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T11:12:41.8970236Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.8992593Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9011850Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9028340Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9047416Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9065679Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9081219Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9096851Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9112016Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9127291Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9143211Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9159016Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9181147Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9196685Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9212222Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9228668Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9243854Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9258836Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9274320Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9290982Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9306138Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9321004Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9335204Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9353983Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9369521Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9385184Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9399954Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9415448Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9431360Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9445998Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9461171Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9476678Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9497338Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9515979Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9533009Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9553635Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9568830Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9591869Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9608348Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9626118Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9642231Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9658473Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9674034Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9689657Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9705392Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9721451Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9741040Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9756984Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9772270Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9787763Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9803388Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9818392Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9835290Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9850683Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9865969Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9882864Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9899921Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9916706Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9943776Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9960410Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9976715Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:41.9992536Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0009889Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0026566Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0043327Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0060168Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0076353Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0092540Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0110016Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0126754Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0149087Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0165320Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0181758Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0200324Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0216402Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0232482Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0250772Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0266394Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0283502Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0301539Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0317926Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T11:12:42.0340017Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T11:12:42.0364932Z ##[endgroup] 2025-12-04T11:12:42.0365187Z ##[group]Fetching the repository 2025-12-04T11:12:42.0368799Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T11:12:43.3704822Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T11:12:43.3812513Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:43.3817633Z ##[endgroup] 2025-12-04T11:12:43.3817976Z ##[group]Determining the checkout info 2025-12-04T11:12:43.3819558Z ##[endgroup] 2025-12-04T11:12:43.3833167Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T11:12:43.3923031Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T11:12:43.3950238Z ##[group]Checking out the ref 2025-12-04T11:12:43.3952289Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:43.4223818Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T11:12:43.4231831Z ##[endgroup] 2025-12-04T11:12:43.4232198Z ##[group]Setting up auth for fetching submodules 2025-12-04T11:12:43.4234882Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T11:12:43.4265714Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T11:12:43.4292095Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T11:12:43.4312937Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T11:12:43.4329858Z ##[endgroup] 2025-12-04T11:12:43.4330136Z ##[group]Fetching submodules 2025-12-04T11:12:43.4331582Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T11:12:43.4527022Z Synchronizing submodule url for 'android/libs/fbjni' 2025-12-04T11:12:43.4538350Z Synchronizing submodule url for 'third_party/FP16' 2025-12-04T11:12:43.4550655Z Synchronizing submodule url for 'third_party/FXdiv' 2025-12-04T11:12:43.4563190Z Synchronizing submodule url for 'third_party/NNPACK' 2025-12-04T11:12:43.4573200Z Synchronizing submodule url for 'third_party/NVTX' 2025-12-04T11:12:43.4585575Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:43.4596622Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-12-04T11:12:43.4613550Z Synchronizing submodule url for 'third_party/aiter' 2025-12-04T11:12:43.4627333Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:43.4645891Z Synchronizing submodule url for 'third_party/benchmark' 2025-12-04T11:12:43.4659958Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-12-04T11:12:43.4673015Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-12-04T11:12:43.4689529Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-12-04T11:12:43.4700381Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-12-04T11:12:43.4710515Z Synchronizing submodule url for 'third_party/cutlass' 2025-12-04T11:12:43.4723037Z Synchronizing submodule url for 'third_party/fbgemm' 2025-12-04T11:12:43.4737590Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:43.4757840Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:43.4773159Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:43.4783223Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:43.4801742Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:43.4811668Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:43.4821766Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-12-04T11:12:43.4833247Z Synchronizing submodule url for 'third_party/flash-attention' 2025-12-04T11:12:43.4845373Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:43.4856240Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:43.4878112Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-12-04T11:12:43.4890455Z Synchronizing submodule url for 'third_party/fmt' 2025-12-04T11:12:43.4900789Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:43.4911093Z Synchronizing submodule url for 'third_party/gloo' 2025-12-04T11:12:43.4921176Z Synchronizing submodule url for 'third_party/googletest' 2025-12-04T11:12:43.4930180Z Synchronizing submodule url for 'third_party/ideep' 2025-12-04T11:12:43.4940487Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:43.4959325Z Synchronizing submodule url for 'third_party/ittapi' 2025-12-04T11:12:43.4970928Z Synchronizing submodule url for 'third_party/kineto' 2025-12-04T11:12:43.4980807Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:43.4992839Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:43.5001784Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:43.5013115Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:43.5028639Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:43.5038981Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:43.5061436Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:43.5071662Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:43.5081804Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:43.5092638Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:43.5101675Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:43.5113459Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:43.5123969Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:43.5138072Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:43.5146954Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:43.5158982Z Synchronizing submodule url for 'third_party/kleidiai' 2025-12-04T11:12:43.5169862Z Synchronizing submodule url for 'third_party/mimalloc' 2025-12-04T11:12:43.5182774Z Synchronizing submodule url for 'third_party/nlohmann' 2025-12-04T11:12:43.5194276Z Synchronizing submodule url for 'third_party/onnx' 2025-12-04T11:12:43.5212140Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:43.5227583Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-12-04T11:12:43.5238616Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:43.5248374Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:43.5258148Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:43.5267739Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:43.5281509Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:43.5292033Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:43.5302910Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:43.5318212Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:43.5333295Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:43.5345472Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:43.5365466Z Synchronizing submodule url for 'third_party/pocketfft' 2025-12-04T11:12:43.5376631Z Synchronizing submodule url for 'third_party/protobuf' 2025-12-04T11:12:43.5393725Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:43.5409304Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:43.5427638Z Synchronizing submodule url for 'third_party/psimd' 2025-12-04T11:12:43.5437554Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-12-04T11:12:43.5446492Z Synchronizing submodule url for 'third_party/pybind11' 2025-12-04T11:12:43.5457095Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-12-04T11:12:43.5467584Z Synchronizing submodule url for 'third_party/sleef' 2025-12-04T11:12:43.5482280Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-12-04T11:12:43.5493293Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:43.5502901Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:43.5514366Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:43.5526524Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:43.5540783Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:43.5577432Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T11:12:43.5870290Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T11:12:43.5930116Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T11:12:43.5979170Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T11:12:43.6036757Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T11:12:43.6112020Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T11:12:43.6176071Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T11:12:43.6341599Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T11:12:43.6466297Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T11:12:43.6636738Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T11:12:43.6698002Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T11:12:43.6888831Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T11:12:43.6963382Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T11:12:43.7049844Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T11:12:43.7134902Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T11:12:43.7281911Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T11:12:43.7407173Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T11:12:43.7467323Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T11:12:43.7641450Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T11:12:43.7724942Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T11:12:43.7831077Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T11:12:43.7899632Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:43.7955460Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T11:12:43.8031635Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T11:12:43.8118627Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T11:12:43.8328103Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T11:12:43.8444123Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T11:12:43.8543685Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T11:12:43.8613067Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T11:12:43.8684373Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T11:12:43.8747586Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T11:12:43.8804958Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:43.8870735Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T11:12:43.9036251Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T11:12:43.9103994Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T11:12:43.9176647Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T11:12:43.9256303Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T11:12:43.9338924Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T11:12:43.9399777Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T11:12:43.9476434Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T11:12:43.9531013Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T11:12:43.9588881Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T11:12:43.9652598Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T11:12:43.9719956Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:43.9802650Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T11:12:43.9854242Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T11:12:43.9915058Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T11:12:43.9996557Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T11:12:44.0063972Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T11:12:44.0133636Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T11:12:44.0194374Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T11:12:44.0285453Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T11:12:44.0360075Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T11:12:44.0461152Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T11:12:44.0614637Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T11:12:44.0711474Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T11:12:44.0818328Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T11:12:44.0893979Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T11:12:44.0956204Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T11:12:44.1014989Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T11:12:44.1114409Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T11:12:44.1166322Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T11:12:44.1220254Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T11:12:44.1284766Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T11:12:44.1365632Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T11:12:44.1430823Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T11:12:44.1568147Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T11:12:44.1628489Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T11:12:44.1779226Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T11:12:44.1855881Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T11:12:44.1926442Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T11:12:44.1996201Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T11:12:44.2051867Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T11:12:44.2128459Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T11:12:44.2203246Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T11:12:44.2262314Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T11:12:44.2315411Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T11:12:44.2379290Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T11:12:44.2435240Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T11:12:44.2569301Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T11:12:44.2646187Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T11:12:44.2697789Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T11:12:44.2723849Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T11:12:44.2892098Z Entering 'android/libs/fbjni' 2025-12-04T11:12:44.2913406Z Entering 'third_party/FP16' 2025-12-04T11:12:44.2933948Z Entering 'third_party/FXdiv' 2025-12-04T11:12:44.2953742Z Entering 'third_party/NNPACK' 2025-12-04T11:12:44.2972964Z Entering 'third_party/NVTX' 2025-12-04T11:12:44.2992098Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:44.3012157Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:44.3042985Z Entering 'third_party/aiter' 2025-12-04T11:12:44.3068041Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:44.3101514Z Entering 'third_party/benchmark' 2025-12-04T11:12:44.3122918Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:44.3146903Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:44.3165915Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:44.3197932Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:44.3222876Z Entering 'third_party/cutlass' 2025-12-04T11:12:44.3245990Z Entering 'third_party/fbgemm' 2025-12-04T11:12:44.3268253Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:44.3289191Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:44.3314214Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:44.3335016Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:44.3357991Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:44.3378103Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:44.3397241Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:44.3420379Z Entering 'third_party/flash-attention' 2025-12-04T11:12:44.3448040Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:44.3472178Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:44.3499175Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:44.3519506Z Entering 'third_party/fmt' 2025-12-04T11:12:44.3539854Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:44.3565243Z Entering 'third_party/gloo' 2025-12-04T11:12:44.3589226Z Entering 'third_party/googletest' 2025-12-04T11:12:44.3609683Z Entering 'third_party/ideep' 2025-12-04T11:12:44.3629629Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:44.3654856Z Entering 'third_party/ittapi' 2025-12-04T11:12:44.3674520Z Entering 'third_party/kineto' 2025-12-04T11:12:44.3697195Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:44.3716599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:44.3740977Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:44.3761643Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:44.3780555Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:44.3798360Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:44.3832739Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:44.3861357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:44.3883317Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:44.3908701Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:44.3931085Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:44.3951269Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:44.3978294Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:44.4005125Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:44.4023806Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:44.4043850Z Entering 'third_party/kleidiai' 2025-12-04T11:12:44.4073544Z Entering 'third_party/mimalloc' 2025-12-04T11:12:44.4097215Z Entering 'third_party/nlohmann' 2025-12-04T11:12:44.4126570Z Entering 'third_party/onnx' 2025-12-04T11:12:44.4154041Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:44.4182426Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:44.4203316Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:44.4223554Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:44.4244746Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:44.4267138Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:44.4287276Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:44.4306403Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:44.4327827Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:44.4346451Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:44.4364724Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:44.4385566Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:44.4413004Z Entering 'third_party/pocketfft' 2025-12-04T11:12:44.4432422Z Entering 'third_party/protobuf' 2025-12-04T11:12:44.4452929Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:44.4472200Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:44.4497363Z Entering 'third_party/psimd' 2025-12-04T11:12:44.4517046Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:44.4539596Z Entering 'third_party/pybind11' 2025-12-04T11:12:44.4558470Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:44.4578425Z Entering 'third_party/sleef' 2025-12-04T11:12:44.4598272Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:44.4618129Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:44.4648948Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:44.4668367Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:44.4687767Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:44.4709162Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:44.4749305Z ##[endgroup] 2025-12-04T11:12:44.4749511Z ##[group]Persisting credentials for submodules 2025-12-04T11:12:44.4755260Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T11:12:44.4955247Z Entering 'android/libs/fbjni' 2025-12-04T11:12:44.4973096Z url.https://github.com/.insteadof 2025-12-04T11:12:44.4973235Z url.https://github.com/.insteadof 2025-12-04T11:12:44.4993136Z Entering 'third_party/FP16' 2025-12-04T11:12:44.5016062Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5016208Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5039013Z Entering 'third_party/FXdiv' 2025-12-04T11:12:44.5060622Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5060759Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5079826Z Entering 'third_party/NNPACK' 2025-12-04T11:12:44.5094347Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5094474Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5113909Z Entering 'third_party/NVTX' 2025-12-04T11:12:44.5128254Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5128380Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5146488Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:44.5165488Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5165621Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5194320Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:44.5210301Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5210431Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5246973Z Entering 'third_party/aiter' 2025-12-04T11:12:44.5262999Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5263126Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5283028Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:44.5302787Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5303065Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5328215Z Entering 'third_party/benchmark' 2025-12-04T11:12:44.5348965Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5349145Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5373261Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:44.5389636Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5389836Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5414403Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:44.5431494Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5431652Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5453253Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:44.5470746Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5471180Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5492285Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:44.5507997Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5508531Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5528635Z Entering 'third_party/cutlass' 2025-12-04T11:12:44.5548278Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5548537Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5577122Z Entering 'third_party/fbgemm' 2025-12-04T11:12:44.5591181Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5591414Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5614102Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:44.5629466Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5629730Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5648584Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:44.5662996Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5663202Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5686612Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:44.5698778Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5698980Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5716431Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:44.5729546Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5729781Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5756064Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:44.5772893Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5773090Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5790955Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:44.5803711Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5803895Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5823831Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:44.5839463Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5839640Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5865665Z Entering 'third_party/flash-attention' 2025-12-04T11:12:44.5885893Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5886068Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5910504Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:44.5926420Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5926584Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5949980Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:44.5965703Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5965876Z url.https://github.com/.insteadof 2025-12-04T11:12:44.5991847Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:44.6007036Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6007207Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6024299Z Entering 'third_party/fmt' 2025-12-04T11:12:44.6038532Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6038691Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6061537Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:44.6074343Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6074518Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6101308Z Entering 'third_party/gloo' 2025-12-04T11:12:44.6119382Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6119546Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6141124Z Entering 'third_party/googletest' 2025-12-04T11:12:44.6154872Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6155031Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6172766Z Entering 'third_party/ideep' 2025-12-04T11:12:44.6191674Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6191832Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6210686Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:44.6226192Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6226344Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6247851Z Entering 'third_party/ittapi' 2025-12-04T11:12:44.6263784Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6263942Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6281496Z Entering 'third_party/kineto' 2025-12-04T11:12:44.6297707Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6297937Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6314641Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:44.6331049Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6331280Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6349886Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:44.6363246Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6363389Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6387341Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:44.6403065Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6403209Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6426082Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:44.6442269Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6442422Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6460836Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:44.6473043Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6473193Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6489073Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:44.6504308Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6504464Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6526664Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:44.6542748Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6542896Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6562710Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:44.6579478Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6579632Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6597183Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:44.6620718Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6620877Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6648816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:44.6669948Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6670130Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6698283Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:44.6716505Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6716739Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6743355Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:44.6761314Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6761531Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6780175Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:44.6803158Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6803398Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6826612Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:44.6840319Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6840527Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6857041Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:44.6871298Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6871485Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6892890Z Entering 'third_party/kleidiai' 2025-12-04T11:12:44.6906836Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6907012Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6926510Z Entering 'third_party/mimalloc' 2025-12-04T11:12:44.6940321Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6940804Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6960193Z Entering 'third_party/nlohmann' 2025-12-04T11:12:44.6974568Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6974733Z url.https://github.com/.insteadof 2025-12-04T11:12:44.6995355Z Entering 'third_party/onnx' 2025-12-04T11:12:44.7008308Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7008473Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7034623Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:44.7048275Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7048437Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7068523Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:44.7085223Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7085374Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7102824Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:44.7116669Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7116818Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7132847Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:44.7143954Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7144092Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7163118Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:44.7191553Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7191690Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7215340Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:44.7231265Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7231406Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7251505Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:44.7268046Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7268181Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7285253Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:44.7297355Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7297496Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7320090Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:44.7341924Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7342076Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7359231Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:44.7377574Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7377740Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7400243Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:44.7414804Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7414967Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7439986Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:44.7452085Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7452244Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7476822Z Entering 'third_party/pocketfft' 2025-12-04T11:12:44.7491382Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7491525Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7511750Z Entering 'third_party/protobuf' 2025-12-04T11:12:44.7525535Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7546046Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7546216Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:44.7561187Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7561346Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7582704Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:44.7596004Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7596157Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7613747Z Entering 'third_party/psimd' 2025-12-04T11:12:44.7628079Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7628235Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7645140Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:44.7662817Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7662977Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7681360Z Entering 'third_party/pybind11' 2025-12-04T11:12:44.7694795Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7694957Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7710920Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:44.7725117Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7725400Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7742607Z Entering 'third_party/sleef' 2025-12-04T11:12:44.7754716Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7754875Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7771316Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:44.7783192Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7783359Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7802282Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:44.7818143Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7818298Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7835462Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:44.7847359Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7847528Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7870151Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:44.7881044Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7881214Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7899427Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:44.7911033Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7911195Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7925430Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:44.7937791Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7937930Z url.https://github.com/.insteadof 2025-12-04T11:12:44.7976739Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T11:12:44.8149660Z Entering 'android/libs/fbjni' 2025-12-04T11:12:44.8170586Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T11:12:44.8180074Z Entering 'third_party/FP16' 2025-12-04T11:12:44.8202266Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T11:12:44.8214078Z Entering 'third_party/FXdiv' 2025-12-04T11:12:44.8248548Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T11:12:44.8260489Z Entering 'third_party/NNPACK' 2025-12-04T11:12:44.8279293Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T11:12:44.8295741Z Entering 'third_party/NVTX' 2025-12-04T11:12:44.8317633Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T11:12:44.8331677Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:44.8354021Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T11:12:44.8368279Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:44.8391163Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T11:12:44.8406422Z Entering 'third_party/aiter' 2025-12-04T11:12:44.8425227Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T11:12:44.8434861Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:44.8457118Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T11:12:44.8481122Z Entering 'third_party/benchmark' 2025-12-04T11:12:44.8504450Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:44.8514487Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:44.8533209Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T11:12:44.8544705Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:44.8565009Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T11:12:44.8574861Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:44.8594480Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T11:12:44.8604140Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:44.8622559Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T11:12:44.8635705Z Entering 'third_party/cutlass' 2025-12-04T11:12:44.8655932Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T11:12:44.8669360Z Entering 'third_party/fbgemm' 2025-12-04T11:12:44.8696434Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T11:12:44.8713368Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:44.8748387Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T11:12:44.8758664Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:44.8798762Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T11:12:44.8817444Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:44.8842951Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T11:12:44.8854382Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:44.8882973Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T11:12:44.8902184Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:44.8922955Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T11:12:44.8932972Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:44.8956593Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T11:12:44.8972774Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:44.8995320Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T11:12:44.9015328Z Entering 'third_party/flash-attention' 2025-12-04T11:12:44.9035217Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T11:12:44.9046084Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:44.9065311Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T11:12:44.9084920Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:44.9107650Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T11:12:44.9121864Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:44.9141649Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T11:12:44.9151810Z Entering 'third_party/fmt' 2025-12-04T11:12:44.9170272Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:44.9179567Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:44.9199584Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T11:12:44.9210302Z Entering 'third_party/gloo' 2025-12-04T11:12:44.9227844Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T11:12:44.9237433Z Entering 'third_party/googletest' 2025-12-04T11:12:44.9257757Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:44.9267566Z Entering 'third_party/ideep' 2025-12-04T11:12:44.9285882Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T11:12:44.9295498Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:44.9316486Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T11:12:44.9329437Z Entering 'third_party/ittapi' 2025-12-04T11:12:44.9349405Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T11:12:44.9359147Z Entering 'third_party/kineto' 2025-12-04T11:12:44.9380003Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T11:12:44.9388494Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:44.9407401Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T11:12:44.9418053Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:44.9437506Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T11:12:44.9446298Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:44.9470390Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T11:12:44.9480339Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:44.9504935Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T11:12:44.9514374Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:44.9534634Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T11:12:44.9544465Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:44.9567500Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T11:12:44.9578033Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:44.9605457Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T11:12:44.9619060Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:44.9650583Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:44.9663566Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:44.9686594Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T11:12:44.9701314Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:44.9726728Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T11:12:44.9738642Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:44.9768759Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:44.9781785Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:44.9815275Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:44.9829498Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:44.9853844Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:44.9866248Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:44.9887652Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T11:12:44.9896427Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:44.9918274Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T11:12:44.9929243Z Entering 'third_party/kleidiai' 2025-12-04T11:12:44.9954869Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T11:12:44.9965869Z Entering 'third_party/mimalloc' 2025-12-04T11:12:44.9987332Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T11:12:44.9997025Z Entering 'third_party/nlohmann' 2025-12-04T11:12:45.0016633Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T11:12:45.0026818Z Entering 'third_party/onnx' 2025-12-04T11:12:45.0051964Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T11:12:45.0068880Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:45.0090860Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:45.0105974Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:45.0126268Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T11:12:45.0137850Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:45.0158709Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:45.0174237Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:45.0192649Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:45.0202282Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:45.0221725Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T11:12:45.0231432Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:45.0249280Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T11:12:45.0258990Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:45.0298267Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T11:12:45.0312578Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:45.0332148Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T11:12:45.0344476Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:45.0363641Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T11:12:45.0371093Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:45.0395344Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T11:12:45.0406206Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:45.0426074Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T11:12:45.0436589Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:45.0456840Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T11:12:45.0472980Z Entering 'third_party/pocketfft' 2025-12-04T11:12:45.0493374Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T11:12:45.0509214Z Entering 'third_party/protobuf' 2025-12-04T11:12:45.0531743Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T11:12:45.0544424Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:45.0563521Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T11:12:45.0576036Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:45.0597279Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:45.0610264Z Entering 'third_party/psimd' 2025-12-04T11:12:45.0631658Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T11:12:45.0640627Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:45.0659259Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T11:12:45.0673881Z Entering 'third_party/pybind11' 2025-12-04T11:12:45.0696404Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:45.0706082Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:45.0725563Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T11:12:45.0739472Z Entering 'third_party/sleef' 2025-12-04T11:12:45.0758420Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T11:12:45.0769479Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:45.0791517Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T11:12:45.0800813Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:45.0820849Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T11:12:45.0831025Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:45.0864534Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T11:12:45.0877623Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:45.0909567Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T11:12:45.0920707Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:45.0941074Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T11:12:45.0950036Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:45.0975422Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T11:12:45.1240132Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T11:12:45.1417650Z Entering 'android/libs/fbjni' 2025-12-04T11:12:45.1452540Z Entering 'third_party/FP16' 2025-12-04T11:12:45.1480172Z Entering 'third_party/FXdiv' 2025-12-04T11:12:45.1509091Z Entering 'third_party/NNPACK' 2025-12-04T11:12:45.1538273Z Entering 'third_party/NVTX' 2025-12-04T11:12:45.1561682Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:45.1584048Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:45.1614641Z Entering 'third_party/aiter' 2025-12-04T11:12:45.1636516Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:45.1663610Z Entering 'third_party/benchmark' 2025-12-04T11:12:45.1689364Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:45.1717290Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:45.1740873Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:45.1764088Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:45.1794452Z Entering 'third_party/cutlass' 2025-12-04T11:12:45.1823611Z Entering 'third_party/fbgemm' 2025-12-04T11:12:45.1847460Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:45.1869952Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:45.1903299Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:45.1930403Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:45.1961033Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:45.1987977Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:45.2008627Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:45.2031003Z Entering 'third_party/flash-attention' 2025-12-04T11:12:45.2054226Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:45.2077821Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:45.2106995Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:45.2129982Z Entering 'third_party/fmt' 2025-12-04T11:12:45.2153734Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:45.2187142Z Entering 'third_party/gloo' 2025-12-04T11:12:45.2216872Z Entering 'third_party/googletest' 2025-12-04T11:12:45.2241233Z Entering 'third_party/ideep' 2025-12-04T11:12:45.2262770Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:45.2289346Z Entering 'third_party/ittapi' 2025-12-04T11:12:45.2310355Z Entering 'third_party/kineto' 2025-12-04T11:12:45.2331868Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:45.2352178Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:45.2373374Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:45.2406305Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:45.2429999Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:45.2454213Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:45.2483904Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:45.2510829Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:45.2533494Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:45.2552496Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:45.2573093Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:45.2596305Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:45.2627159Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:45.2656942Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:45.2678482Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:45.2702088Z Entering 'third_party/kleidiai' 2025-12-04T11:12:45.2722879Z Entering 'third_party/mimalloc' 2025-12-04T11:12:45.2749359Z Entering 'third_party/nlohmann' 2025-12-04T11:12:45.2773468Z Entering 'third_party/onnx' 2025-12-04T11:12:45.2803246Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:45.2835205Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:45.2862125Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:45.2883996Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:45.2915612Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:45.2947993Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:45.2971749Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:45.2999280Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:45.3022220Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:45.3047655Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:45.3069101Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:45.3093258Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:45.3124512Z Entering 'third_party/pocketfft' 2025-12-04T11:12:45.3145300Z Entering 'third_party/protobuf' 2025-12-04T11:12:45.3173873Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:45.3198955Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:45.3221266Z Entering 'third_party/psimd' 2025-12-04T11:12:45.3246057Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:45.3274094Z Entering 'third_party/pybind11' 2025-12-04T11:12:45.3300512Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:45.3322163Z Entering 'third_party/sleef' 2025-12-04T11:12:45.3348883Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:45.3375665Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:45.3401222Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:45.3440855Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:45.3471174Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:45.3497437Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:45.3541953Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T11:12:45.3726161Z Entering 'android/libs/fbjni' 2025-12-04T11:12:45.3749216Z Entering 'third_party/FP16' 2025-12-04T11:12:45.3770149Z Entering 'third_party/FXdiv' 2025-12-04T11:12:45.3791517Z Entering 'third_party/NNPACK' 2025-12-04T11:12:45.3810581Z Entering 'third_party/NVTX' 2025-12-04T11:12:45.3830157Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T11:12:45.3855979Z Entering 'third_party/XNNPACK' 2025-12-04T11:12:45.3883108Z Entering 'third_party/aiter' 2025-12-04T11:12:45.3909137Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T11:12:45.3943608Z Entering 'third_party/benchmark' 2025-12-04T11:12:45.3974481Z Entering 'third_party/composable_kernel' 2025-12-04T11:12:45.3997615Z Entering 'third_party/cpp-httplib' 2025-12-04T11:12:45.4019284Z Entering 'third_party/cpuinfo' 2025-12-04T11:12:45.4038462Z Entering 'third_party/cudnn_frontend' 2025-12-04T11:12:45.4060988Z Entering 'third_party/cutlass' 2025-12-04T11:12:45.4087875Z Entering 'third_party/fbgemm' 2025-12-04T11:12:45.4116972Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T11:12:45.4138717Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T11:12:45.4161180Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T11:12:45.4185729Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T11:12:45.4213019Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T11:12:45.4232283Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T11:12:45.4251733Z Entering 'third_party/fbgemm/external/json' 2025-12-04T11:12:45.4271945Z Entering 'third_party/flash-attention' 2025-12-04T11:12:45.4292506Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T11:12:45.4325606Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T11:12:45.4351145Z Entering 'third_party/flatbuffers' 2025-12-04T11:12:45.4378414Z Entering 'third_party/fmt' 2025-12-04T11:12:45.4401020Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T11:12:45.4420207Z Entering 'third_party/gloo' 2025-12-04T11:12:45.4440620Z Entering 'third_party/googletest' 2025-12-04T11:12:45.4459296Z Entering 'third_party/ideep' 2025-12-04T11:12:45.4484001Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T11:12:45.4510355Z Entering 'third_party/ittapi' 2025-12-04T11:12:45.4530471Z Entering 'third_party/kineto' 2025-12-04T11:12:45.4551571Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T11:12:45.4570948Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T11:12:45.4590458Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T11:12:45.4608958Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T11:12:45.4626429Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T11:12:45.4651720Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T11:12:45.4687260Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T11:12:45.4707761Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T11:12:45.4730814Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T11:12:45.4749346Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T11:12:45.4778452Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T11:12:45.4797579Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:45.4819010Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:45.4845496Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T11:12:45.4867564Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T11:12:45.4893315Z Entering 'third_party/kleidiai' 2025-12-04T11:12:45.4912867Z Entering 'third_party/mimalloc' 2025-12-04T11:12:45.4933152Z Entering 'third_party/nlohmann' 2025-12-04T11:12:45.4952115Z Entering 'third_party/onnx' 2025-12-04T11:12:45.4981383Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T11:12:45.5008156Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T11:12:45.5030569Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T11:12:45.5056720Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T11:12:45.5075464Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T11:12:45.5098076Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T11:12:45.5119449Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T11:12:45.5142036Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T11:12:45.5160849Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T11:12:45.5180550Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T11:12:45.5210421Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T11:12:45.5232267Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T11:12:45.5261136Z Entering 'third_party/pocketfft' 2025-12-04T11:12:45.5279334Z Entering 'third_party/protobuf' 2025-12-04T11:12:45.5300901Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T11:12:45.5326455Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T11:12:45.5356999Z Entering 'third_party/psimd' 2025-12-04T11:12:45.5375689Z Entering 'third_party/pthreadpool' 2025-12-04T11:12:45.5395913Z Entering 'third_party/pybind11' 2025-12-04T11:12:45.5418235Z Entering 'third_party/python-peachpy' 2025-12-04T11:12:45.5442380Z Entering 'third_party/sleef' 2025-12-04T11:12:45.5464794Z Entering 'third_party/tensorpipe' 2025-12-04T11:12:45.5492396Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T11:12:45.5511258Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T11:12:45.5528538Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T11:12:45.5554540Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T11:12:45.5575371Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T11:12:45.5612742Z ##[endgroup] 2025-12-04T11:12:45.5816012Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T11:12:45.5926070Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:45.6067662Z Prepare all required actions 2025-12-04T11:12:45.6067998Z Getting action download info 2025-12-04T11:12:45.8686611Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T11:12:46.5946370Z ##[group]Run ./.github/actions/setup-rocm 2025-12-04T11:12:46.5946505Z env: 2025-12-04T11:12:46.5946593Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.5946693Z ##[endgroup] 2025-12-04T11:12:46.5958735Z ##[group]Run dpkg -l | grep -E " rocm" 2025-12-04T11:12:46.5958873Z dpkg -l | grep -E " rocm" 2025-12-04T11:12:46.5963378Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.5963521Z env: 2025-12-04T11:12:46.5963607Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.5963706Z ##[endgroup] 2025-12-04T11:12:46.6022773Z ii rocm-cmake 0.14.0.60401-83~22.04 amd64 rocm-cmake built using CMake 2025-12-04T11:12:46.6023401Z ii rocm-core 6.4.1.60401-83~22.04 amd64 ROCm Runtime software stack 2025-12-04T11:12:46.6023873Z ii rocm-dbgapi 0.77.2.60401-83~22.04 amd64 Library to provide AMD GPU debugger API 2025-12-04T11:12:46.6024454Z ii rocm-debug-agent 2.0.4.60401-83~22.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent) 2025-12-04T11:12:46.6024984Z ii rocm-dev 6.4.1.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-12-04T11:12:46.6025498Z ii rocm-device-libs 1.0.0.60401-83~22.04 amd64 Radeon Open Compute - device libraries 2025-12-04T11:12:46.6025940Z ii rocm-gdb 15.2.60401-83~22.04 amd64 ROCgdb 2025-12-04T11:12:46.6026338Z ii rocm-llvm 19.0.0.25184.60401-83~22.04 amd64 ROCm core compiler 2025-12-04T11:12:46.6026747Z ii rocm-opencl 2.0.0.60401-83~22.04 amd64 clr built using CMake 2025-12-04T11:12:46.6027144Z ii rocm-opencl-dev 2.0.0.60401-83~22.04 amd64 clr built using CMake 2025-12-04T11:12:46.6027559Z ii rocm-smi-lib 7.5.0.60401-83~22.04 amd64 AMD System Management libraries 2025-12-04T11:12:46.6028002Z ii rocm-utils 6.4.1.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-12-04T11:12:46.6028460Z ii rocminfo 1.0.0.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime rocminfo tool 2025-12-04T11:12:46.6046534Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T11:12:46.6046810Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T11:12:46.6047110Z # shellcheck disable=SC2046 2025-12-04T11:12:46.6047256Z docker stop $(docker ps -q) || true 2025-12-04T11:12:46.6047393Z # Prune all stopped containers. 2025-12-04T11:12:46.6047530Z docker container prune -f 2025-12-04T11:12:46.6052026Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.6052182Z env: 2025-12-04T11:12:46.6052274Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.6052387Z ##[endgroup] 2025-12-04T11:12:46.6235901Z docker: 'docker stop' requires at least 1 argument 2025-12-04T11:12:46.6236033Z 2025-12-04T11:12:46.6236107Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T11:12:46.6236222Z 2025-12-04T11:12:46.6236295Z See 'docker stop --help' for more information 2025-12-04T11:12:46.6341807Z Total reclaimed space: 0B 2025-12-04T11:12:46.6363035Z ##[group]Run cat /etc/os-release || true 2025-12-04T11:12:46.6363213Z cat /etc/os-release || true 2025-12-04T11:12:46.6363395Z cat /etc/apt/sources.list.d/rocm.list || true 2025-12-04T11:12:46.6363743Z cat /opt/rocm/.info/version || true 2025-12-04T11:12:46.6363882Z whoami 2025-12-04T11:12:46.6367241Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.6367382Z env: 2025-12-04T11:12:46.6367466Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.6367565Z ##[endgroup] 2025-12-04T11:12:46.6393383Z PRETTY_NAME="Ubuntu 22.04.5 LTS" 2025-12-04T11:12:46.6393489Z NAME="Ubuntu" 2025-12-04T11:12:46.6393729Z VERSION_ID="22.04" 2025-12-04T11:12:46.6393831Z VERSION="22.04.5 LTS (Jammy Jellyfish)" 2025-12-04T11:12:46.6393960Z VERSION_CODENAME=jammy 2025-12-04T11:12:46.6394058Z ID=ubuntu 2025-12-04T11:12:46.6394144Z ID_LIKE=debian 2025-12-04T11:12:46.6394419Z HOME_URL="https://www.ubuntu.com/" 2025-12-04T11:12:46.6394559Z SUPPORT_URL="https://help.ubuntu.com/" 2025-12-04T11:12:46.6394705Z BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" 2025-12-04T11:12:46.6394915Z PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" 2025-12-04T11:12:46.6395107Z UBUNTU_CODENAME=jammy 2025-12-04T11:12:46.6401252Z deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.1 jammy main 2025-12-04T11:12:46.6412408Z 6.4.1-83 2025-12-04T11:12:46.6419930Z runner 2025-12-04T11:12:46.6442189Z ##[group]Run dpkg -l | grep -E " amdgpu" 2025-12-04T11:12:46.6442427Z dpkg -l | grep -E " amdgpu" 2025-12-04T11:12:46.6447651Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.6447839Z env: 2025-12-04T11:12:46.6447958Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.6448095Z ##[endgroup] 2025-12-04T11:12:46.6510299Z ii amdgpu-core 1:6.4.60401-2164967.22.04 all Core meta package for unified amdgpu driver. 2025-12-04T11:12:46.6510553Z ii amdgpu-install 6.4.60401-2164967.22.04 all AMDGPU driver repository and installer 2025-12-04T11:12:46.6528151Z ##[group]Run rocm-smi 2025-12-04T11:12:46.6528279Z rocm-smi 2025-12-04T11:12:46.6532369Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.6532526Z env: 2025-12-04T11:12:46.6532620Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.6532729Z ##[endgroup] 2025-12-04T11:12:46.7160939Z 2025-12-04T11:12:46.7160967Z 2025-12-04T11:12:46.7161122Z ============================================ ROCm System Management Interface ============================================ 2025-12-04T11:12:46.7161344Z ====================================================== Concise Info ====================================================== 2025-12-04T11:12:46.7161575Z Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2025-12-04T11:12:46.7162168Z  (DID, GUID) (Junction) (Socket) (Mem, Compute, ID)  2025-12-04T11:12:46.7162378Z ========================================================================================================================== 2025-12-04T11:12:46.7162979Z 0 3 0x74a5, 51110 27.0°C 120.0W NPS1, SPX, 0 N/A 900Mhz 0% manual 1000.0W 0% 0% 2025-12-04T11:12:46.7163241Z 1 5 0x74a5, 2987 27.0°C 118.0W NPS1, SPX, 0 N/A 900Mhz 0% manual 1000.0W 0% 0% 2025-12-04T11:12:46.7163497Z 2 4 0x74a5, 61326 27.0°C 118.0W NPS1, SPX, 0 N/A 900Mhz 0% manual 1000.0W 0% 0% 2025-12-04T11:12:46.7163751Z 3 2 0x74a5, 9091 28.0°C 121.0W NPS1, SPX, 0 N/A 900Mhz 0% manual 1000.0W 0% 0% 2025-12-04T11:12:46.7164108Z ========================================================================================================================== 2025-12-04T11:12:46.7164303Z ================================================== End of ROCm SMI Log =================================================== 2025-12-04T11:12:46.7228348Z ##[group]Run rocminfo 2025-12-04T11:12:46.7228510Z rocminfo 2025-12-04T11:12:46.7233166Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.7233343Z env: 2025-12-04T11:12:46.7233445Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.7233565Z ##[endgroup] 2025-12-04T11:12:46.8115969Z ROCk module version 6.12.12 is loaded 2025-12-04T11:12:46.8116270Z ===================== 2025-12-04T11:12:46.8116432Z HSA System Attributes 2025-12-04T11:12:46.8116573Z ===================== 2025-12-04T11:12:46.8116723Z Runtime Version: 1.15 2025-12-04T11:12:46.8116893Z Runtime Ext Version: 1.7 2025-12-04T11:12:46.8117069Z System Timestamp Freq.: 1000.000000MHz 2025-12-04T11:12:46.8117358Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-12-04T11:12:46.8117633Z Machine Model: LARGE 2025-12-04T11:12:46.8117885Z System Endianness: LITTLE 2025-12-04T11:12:46.8118101Z Mwaitx: DISABLED 2025-12-04T11:12:46.8118289Z XNACK enabled: NO 2025-12-04T11:12:46.8118445Z DMAbuf Support: YES 2025-12-04T11:12:46.8118598Z VMM Support: YES 2025-12-04T11:12:46.8118691Z 2025-12-04T11:12:46.8118749Z ========== 2025-12-04T11:12:46.8118891Z HSA Agents 2025-12-04T11:12:46.8119055Z ========== 2025-12-04T11:12:46.8119181Z ******* 2025-12-04T11:12:46.8119317Z Agent 1 2025-12-04T11:12:46.8119450Z ******* 2025-12-04T11:12:46.8119629Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:12:46.8119932Z Uuid: CPU-XX 2025-12-04T11:12:46.8120152Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:12:46.8120370Z Vendor Name: CPU 2025-12-04T11:12:46.8120570Z Feature: None specified 2025-12-04T11:12:46.8120810Z Profile: FULL_PROFILE 2025-12-04T11:12:46.8121015Z Float Round Mode: NEAR 2025-12-04T11:12:46.8121254Z Max Queue Number: 0(0x0) 2025-12-04T11:12:46.8121462Z Queue Min Size: 0(0x0) 2025-12-04T11:12:46.8121661Z Queue Max Size: 0(0x0) 2025-12-04T11:12:46.8121864Z Queue Type: MULTI 2025-12-04T11:12:46.8122048Z Node: 0 2025-12-04T11:12:46.8122266Z Device Type: CPU 2025-12-04T11:12:46.8122455Z Cache Info: 2025-12-04T11:12:46.8122609Z L1: 49152(0xc000) KB 2025-12-04T11:12:46.8122809Z Chip ID: 0(0x0) 2025-12-04T11:12:46.8123299Z ASIC Revision: 0(0x0) 2025-12-04T11:12:46.8123508Z Cacheline Size: 64(0x40) 2025-12-04T11:12:46.8123716Z Max Clock Freq. (MHz): 3300 2025-12-04T11:12:46.8123906Z BDFID: 0 2025-12-04T11:12:46.8124237Z Internal Node ID: 0 2025-12-04T11:12:46.8124446Z Compute Unit: 64 2025-12-04T11:12:46.8124656Z SIMDs per CU: 0 2025-12-04T11:12:46.8124866Z Shader Engines: 0 2025-12-04T11:12:46.8125084Z Shader Arrs. per Eng.: 0 2025-12-04T11:12:46.8125298Z WatchPts on Addr. Ranges:1 2025-12-04T11:12:46.8125493Z Memory Properties: 2025-12-04T11:12:46.8125634Z Features: None 2025-12-04T11:12:46.8125780Z Pool Info: 2025-12-04T11:12:46.8126040Z Pool 1 2025-12-04T11:12:46.8126234Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:12:46.8126439Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:12:46.8126603Z Allocatable: TRUE 2025-12-04T11:12:46.8126770Z Alloc Granule: 4KB 2025-12-04T11:12:46.8126956Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8127132Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8127313Z Accessible by all: TRUE 2025-12-04T11:12:46.8127470Z Pool 2 2025-12-04T11:12:46.8127618Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:12:46.8127787Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:12:46.8127959Z Allocatable: TRUE 2025-12-04T11:12:46.8128138Z Alloc Granule: 4KB 2025-12-04T11:12:46.8128321Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8128508Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8128688Z Accessible by all: TRUE 2025-12-04T11:12:46.8128840Z Pool 3 2025-12-04T11:12:46.8128978Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T11:12:46.8129144Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:12:46.8129304Z Allocatable: TRUE 2025-12-04T11:12:46.8129469Z Alloc Granule: 4KB 2025-12-04T11:12:46.8129644Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8129886Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8130060Z Accessible by all: TRUE 2025-12-04T11:12:46.8130209Z Pool 4 2025-12-04T11:12:46.8130372Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:12:46.8130535Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:12:46.8130697Z Allocatable: TRUE 2025-12-04T11:12:46.8130870Z Alloc Granule: 4KB 2025-12-04T11:12:46.8131055Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8131229Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8131409Z Accessible by all: TRUE 2025-12-04T11:12:46.8131557Z ISA Info: 2025-12-04T11:12:46.8131715Z ******* 2025-12-04T11:12:46.8131823Z Agent 2 2025-12-04T11:12:46.8131932Z ******* 2025-12-04T11:12:46.8132066Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:12:46.8132226Z Uuid: CPU-XX 2025-12-04T11:12:46.8132391Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:12:46.8132563Z Vendor Name: CPU 2025-12-04T11:12:46.8132734Z Feature: None specified 2025-12-04T11:12:46.8132898Z Profile: FULL_PROFILE 2025-12-04T11:12:46.8133066Z Float Round Mode: NEAR 2025-12-04T11:12:46.8133237Z Max Queue Number: 0(0x0) 2025-12-04T11:12:46.8133404Z Queue Min Size: 0(0x0) 2025-12-04T11:12:46.8133568Z Queue Max Size: 0(0x0) 2025-12-04T11:12:46.8133782Z Queue Type: MULTI 2025-12-04T11:12:46.8133940Z Node: 1 2025-12-04T11:12:46.8134094Z Device Type: CPU 2025-12-04T11:12:46.8134244Z Cache Info: 2025-12-04T11:12:46.8134374Z L1: 49152(0xc000) KB 2025-12-04T11:12:46.8134525Z Chip ID: 0(0x0) 2025-12-04T11:12:46.8134684Z ASIC Revision: 0(0x0) 2025-12-04T11:12:46.8134852Z Cacheline Size: 64(0x40) 2025-12-04T11:12:46.8135016Z Max Clock Freq. (MHz): 3300 2025-12-04T11:12:46.8135184Z BDFID: 0 2025-12-04T11:12:46.8135347Z Internal Node ID: 1 2025-12-04T11:12:46.8135516Z Compute Unit: 64 2025-12-04T11:12:46.8135685Z SIMDs per CU: 0 2025-12-04T11:12:46.8135859Z Shader Engines: 0 2025-12-04T11:12:46.8136030Z Shader Arrs. per Eng.: 0 2025-12-04T11:12:46.8136204Z WatchPts on Addr. Ranges:1 2025-12-04T11:12:46.8136355Z Memory Properties: 2025-12-04T11:12:46.8136477Z Features: None 2025-12-04T11:12:46.8136590Z Pool Info: 2025-12-04T11:12:46.8136686Z Pool 1 2025-12-04T11:12:46.8136809Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:12:46.8136949Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:12:46.8137089Z Allocatable: TRUE 2025-12-04T11:12:46.8137238Z Alloc Granule: 4KB 2025-12-04T11:12:46.8137395Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8137550Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8137698Z Accessible by all: TRUE 2025-12-04T11:12:46.8137829Z Pool 2 2025-12-04T11:12:46.8137951Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:12:46.8138089Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:12:46.8138227Z Allocatable: TRUE 2025-12-04T11:12:46.8138377Z Alloc Granule: 4KB 2025-12-04T11:12:46.8138529Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8138683Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8138832Z Accessible by all: TRUE 2025-12-04T11:12:46.8138992Z Pool 3 2025-12-04T11:12:46.8139112Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T11:12:46.8139253Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:12:46.8139393Z Allocatable: TRUE 2025-12-04T11:12:46.8139539Z Alloc Granule: 4KB 2025-12-04T11:12:46.8139759Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8139913Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8140063Z Accessible by all: TRUE 2025-12-04T11:12:46.8140189Z Pool 4 2025-12-04T11:12:46.8140308Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:12:46.8140450Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:12:46.8140656Z Allocatable: TRUE 2025-12-04T11:12:46.8140804Z Alloc Granule: 4KB 2025-12-04T11:12:46.8140954Z Alloc Recommended Granule:4KB 2025-12-04T11:12:46.8141109Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8141260Z Accessible by all: TRUE 2025-12-04T11:12:46.8141386Z ISA Info: 2025-12-04T11:12:46.8141481Z ******* 2025-12-04T11:12:46.8141570Z Agent 3 2025-12-04T11:12:46.8141660Z ******* 2025-12-04T11:12:46.8141766Z Name: gfx942 2025-12-04T11:12:46.8141899Z Uuid: GPU-fc3883f959874ad9 2025-12-04T11:12:46.8142044Z Marketing Name: AMD Instinct MI325X 2025-12-04T11:12:46.8142190Z Vendor Name: AMD 2025-12-04T11:12:46.8142338Z Feature: KERNEL_DISPATCH 2025-12-04T11:12:46.8142481Z Profile: BASE_PROFILE 2025-12-04T11:12:46.8142623Z Float Round Mode: NEAR 2025-12-04T11:12:46.8142768Z Max Queue Number: 128(0x80) 2025-12-04T11:12:46.8142912Z Queue Min Size: 64(0x40) 2025-12-04T11:12:46.8143051Z Queue Max Size: 131072(0x20000) 2025-12-04T11:12:46.8143191Z Queue Type: MULTI 2025-12-04T11:12:46.8143325Z Node: 2 2025-12-04T11:12:46.8143456Z Device Type: GPU 2025-12-04T11:12:46.8143581Z Cache Info: 2025-12-04T11:12:46.8143690Z L1: 32(0x20) KB 2025-12-04T11:12:46.8143821Z L2: 4096(0x1000) KB 2025-12-04T11:12:46.8143944Z L3: 262144(0x40000) KB 2025-12-04T11:12:46.8144070Z Chip ID: 29861(0x74a5) 2025-12-04T11:12:46.8144209Z ASIC Revision: 1(0x1) 2025-12-04T11:12:46.8144355Z Cacheline Size: 128(0x80) 2025-12-04T11:12:46.8144497Z Max Clock Freq. (MHz): 2100 2025-12-04T11:12:46.8144635Z BDFID: 29952 2025-12-04T11:12:46.8144775Z Internal Node ID: 2 2025-12-04T11:12:46.8144916Z Compute Unit: 304 2025-12-04T11:12:46.8145057Z SIMDs per CU: 4 2025-12-04T11:12:46.8145196Z Shader Engines: 32 2025-12-04T11:12:46.8145392Z Shader Arrs. per Eng.: 1 2025-12-04T11:12:46.8145544Z WatchPts on Addr. Ranges:4 2025-12-04T11:12:46.8145694Z Coherent Host Access: FALSE 2025-12-04T11:12:46.8145828Z Memory Properties: 2025-12-04T11:12:46.8145938Z Features: KERNEL_DISPATCH 2025-12-04T11:12:46.8146072Z Fast F16 Operation: TRUE 2025-12-04T11:12:46.8146219Z Wavefront Size: 64(0x40) 2025-12-04T11:12:46.8146368Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8146506Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8146626Z x 1024(0x400) 2025-12-04T11:12:46.8146746Z y 1024(0x400) 2025-12-04T11:12:46.8146866Z z 1024(0x400) 2025-12-04T11:12:46.8147004Z Max Waves Per CU: 32(0x20) 2025-12-04T11:12:46.8147177Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:12:46.8147323Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8147449Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8147559Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8147683Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8147805Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8147945Z Max fbarriers/Workgrp: 32 2025-12-04T11:12:46.8150853Z Packet Processor uCode:: 185 2025-12-04T11:12:46.8151016Z SDMA engine uCode:: 24 2025-12-04T11:12:46.8151167Z IOMMU Support:: None 2025-12-04T11:12:46.8151303Z Pool Info: 2025-12-04T11:12:46.8151401Z Pool 1 2025-12-04T11:12:46.8151538Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:12:46.8151682Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8151826Z Allocatable: TRUE 2025-12-04T11:12:46.8151977Z Alloc Granule: 4KB 2025-12-04T11:12:46.8152132Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8152288Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8152441Z Accessible by all: FALSE 2025-12-04T11:12:46.8152573Z Pool 2 2025-12-04T11:12:46.8152697Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:12:46.8152839Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8152985Z Allocatable: TRUE 2025-12-04T11:12:46.8153138Z Alloc Granule: 4KB 2025-12-04T11:12:46.8153290Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8153445Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8153599Z Accessible by all: FALSE 2025-12-04T11:12:46.8153727Z Pool 3 2025-12-04T11:12:46.8153847Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:12:46.8153985Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8154131Z Allocatable: TRUE 2025-12-04T11:12:46.8154283Z Alloc Granule: 4KB 2025-12-04T11:12:46.8154434Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8154665Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8154818Z Accessible by all: FALSE 2025-12-04T11:12:46.8154947Z Pool 4 2025-12-04T11:12:46.8155066Z Segment: GROUP 2025-12-04T11:12:46.8155203Z Size: 64(0x40) KB 2025-12-04T11:12:46.8155339Z Allocatable: FALSE 2025-12-04T11:12:46.8155489Z Alloc Granule: 0KB 2025-12-04T11:12:46.8155641Z Alloc Recommended Granule:0KB 2025-12-04T11:12:46.8155798Z Alloc Alignment: 0KB 2025-12-04T11:12:46.8155948Z Accessible by all: FALSE 2025-12-04T11:12:46.8156076Z ISA Info: 2025-12-04T11:12:46.8156177Z ISA 1 2025-12-04T11:12:46.8156346Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:12:46.8156505Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8156662Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8156813Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8156972Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8157120Z Fast f16: TRUE 2025-12-04T11:12:46.8157264Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8157405Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8157530Z x 1024(0x400) 2025-12-04T11:12:46.8157656Z y 1024(0x400) 2025-12-04T11:12:46.8157782Z z 1024(0x400) 2025-12-04T11:12:46.8157925Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8158057Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8158175Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8158298Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8158423Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8158564Z FBarrier Max Size: 32 2025-12-04T11:12:46.8158695Z ISA 2 2025-12-04T11:12:46.8158831Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:12:46.8159001Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8159154Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8159308Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8159469Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8159619Z Fast f16: TRUE 2025-12-04T11:12:46.8159810Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8159950Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8160073Z x 1024(0x400) 2025-12-04T11:12:46.8160199Z y 1024(0x400) 2025-12-04T11:12:46.8160319Z z 1024(0x400) 2025-12-04T11:12:46.8160454Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8160584Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8160699Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8160825Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8160984Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8172851Z FBarrier Max Size: 32 2025-12-04T11:12:46.8173008Z ******* 2025-12-04T11:12:46.8173122Z Agent 4 2025-12-04T11:12:46.8173223Z ******* 2025-12-04T11:12:46.8173348Z Name: gfx942 2025-12-04T11:12:46.8173504Z Uuid: GPU-cc3748ee0baeca85 2025-12-04T11:12:46.8173660Z Marketing Name: AMD Instinct MI325X 2025-12-04T11:12:46.8173826Z Vendor Name: AMD 2025-12-04T11:12:46.8173982Z Feature: KERNEL_DISPATCH 2025-12-04T11:12:46.8174133Z Profile: BASE_PROFILE 2025-12-04T11:12:46.8174294Z Float Round Mode: NEAR 2025-12-04T11:12:46.8174452Z Max Queue Number: 128(0x80) 2025-12-04T11:12:46.8174681Z Queue Min Size: 64(0x40) 2025-12-04T11:12:46.8174839Z Queue Max Size: 131072(0x20000) 2025-12-04T11:12:46.8174988Z Queue Type: MULTI 2025-12-04T11:12:46.8175139Z Node: 3 2025-12-04T11:12:46.8175290Z Device Type: GPU 2025-12-04T11:12:46.8175424Z Cache Info: 2025-12-04T11:12:46.8175550Z L1: 32(0x20) KB 2025-12-04T11:12:46.8175686Z L2: 4096(0x1000) KB 2025-12-04T11:12:46.8175825Z L3: 262144(0x40000) KB 2025-12-04T11:12:46.8175970Z Chip ID: 29861(0x74a5) 2025-12-04T11:12:46.8176121Z ASIC Revision: 1(0x1) 2025-12-04T11:12:46.8176287Z Cacheline Size: 128(0x80) 2025-12-04T11:12:46.8176449Z Max Clock Freq. (MHz): 2100 2025-12-04T11:12:46.8176594Z BDFID: 1280 2025-12-04T11:12:46.8176752Z Internal Node ID: 3 2025-12-04T11:12:46.8176910Z Compute Unit: 304 2025-12-04T11:12:46.8177058Z SIMDs per CU: 4 2025-12-04T11:12:46.8177218Z Shader Engines: 32 2025-12-04T11:12:46.8177374Z Shader Arrs. per Eng.: 1 2025-12-04T11:12:46.8177541Z WatchPts on Addr. Ranges:4 2025-12-04T11:12:46.8177709Z Coherent Host Access: FALSE 2025-12-04T11:12:46.8177851Z Memory Properties: 2025-12-04T11:12:46.8177977Z Features: KERNEL_DISPATCH 2025-12-04T11:12:46.8178134Z Fast F16 Operation: TRUE 2025-12-04T11:12:46.8178294Z Wavefront Size: 64(0x40) 2025-12-04T11:12:46.8178456Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8178599Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8178728Z x 1024(0x400) 2025-12-04T11:12:46.8178867Z y 1024(0x400) 2025-12-04T11:12:46.8178990Z z 1024(0x400) 2025-12-04T11:12:46.8179136Z Max Waves Per CU: 32(0x20) 2025-12-04T11:12:46.8179295Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:12:46.8179450Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8179595Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8179803Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8179944Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8180083Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8180230Z Max fbarriers/Workgrp: 32 2025-12-04T11:12:46.8180407Z Packet Processor uCode:: 185 2025-12-04T11:12:46.8180575Z SDMA engine uCode:: 24 2025-12-04T11:12:46.8180731Z IOMMU Support:: None 2025-12-04T11:12:46.8180875Z Pool Info: 2025-12-04T11:12:46.8180991Z Pool 1 2025-12-04T11:12:46.8181122Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:12:46.8181283Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8181437Z Allocatable: TRUE 2025-12-04T11:12:46.8181605Z Alloc Granule: 4KB 2025-12-04T11:12:46.8181812Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8181976Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8182144Z Accessible by all: FALSE 2025-12-04T11:12:46.8182289Z Pool 2 2025-12-04T11:12:46.8182418Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:12:46.8182576Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8182725Z Allocatable: TRUE 2025-12-04T11:12:46.8182888Z Alloc Granule: 4KB 2025-12-04T11:12:46.8183058Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8183220Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8183389Z Accessible by all: FALSE 2025-12-04T11:12:46.8183536Z Pool 3 2025-12-04T11:12:46.8183664Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:12:46.8183817Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8183962Z Allocatable: TRUE 2025-12-04T11:12:46.8184124Z Alloc Granule: 4KB 2025-12-04T11:12:46.8184292Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8184454Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8184620Z Accessible by all: FALSE 2025-12-04T11:12:46.8184765Z Pool 4 2025-12-04T11:12:46.8184891Z Segment: GROUP 2025-12-04T11:12:46.8185040Z Size: 64(0x40) KB 2025-12-04T11:12:46.8185198Z Allocatable: FALSE 2025-12-04T11:12:46.8185354Z Alloc Granule: 0KB 2025-12-04T11:12:46.8185523Z Alloc Recommended Granule:0KB 2025-12-04T11:12:46.8185685Z Alloc Alignment: 0KB 2025-12-04T11:12:46.8185849Z Accessible by all: FALSE 2025-12-04T11:12:46.8185992Z ISA Info: 2025-12-04T11:12:46.8186099Z ISA 1 2025-12-04T11:12:46.8186239Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:12:46.8186413Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8186575Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8186742Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8186944Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8187106Z Fast f16: TRUE 2025-12-04T11:12:46.8187264Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8187407Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8187542Z x 1024(0x400) 2025-12-04T11:12:46.8187681Z y 1024(0x400) 2025-12-04T11:12:46.8187810Z z 1024(0x400) 2025-12-04T11:12:46.8187958Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8188106Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8188231Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8188374Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8188533Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8188686Z FBarrier Max Size: 32 2025-12-04T11:12:46.8188823Z ISA 2 2025-12-04T11:12:46.8188961Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:12:46.8189133Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8189283Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8189431Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8189584Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8189764Z Fast f16: TRUE 2025-12-04T11:12:46.8189910Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8190053Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8190174Z x 1024(0x400) 2025-12-04T11:12:46.8190300Z y 1024(0x400) 2025-12-04T11:12:46.8190423Z z 1024(0x400) 2025-12-04T11:12:46.8190554Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8190691Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8190805Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8190929Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8191056Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8191194Z FBarrier Max Size: 32 2025-12-04T11:12:46.8191325Z ******* 2025-12-04T11:12:46.8191423Z Agent 5 2025-12-04T11:12:46.8191514Z ******* 2025-12-04T11:12:46.8191628Z Name: gfx942 2025-12-04T11:12:46.8191766Z Uuid: GPU-c0ef8e6d11fbb7b6 2025-12-04T11:12:46.8191915Z Marketing Name: AMD Instinct MI325X 2025-12-04T11:12:46.8192066Z Vendor Name: AMD 2025-12-04T11:12:46.8192207Z Feature: KERNEL_DISPATCH 2025-12-04T11:12:46.8192354Z Profile: BASE_PROFILE 2025-12-04T11:12:46.8192501Z Float Round Mode: NEAR 2025-12-04T11:12:46.8192644Z Max Queue Number: 128(0x80) 2025-12-04T11:12:46.8192787Z Queue Min Size: 64(0x40) 2025-12-04T11:12:46.8192928Z Queue Max Size: 131072(0x20000) 2025-12-04T11:12:46.8193072Z Queue Type: MULTI 2025-12-04T11:12:46.8193208Z Node: 4 2025-12-04T11:12:46.8193381Z Device Type: GPU 2025-12-04T11:12:46.8193514Z Cache Info: 2025-12-04T11:12:46.8193629Z L1: 32(0x20) KB 2025-12-04T11:12:46.8193755Z L2: 4096(0x1000) KB 2025-12-04T11:12:46.8193881Z L3: 262144(0x40000) KB 2025-12-04T11:12:46.8194007Z Chip ID: 29861(0x74a5) 2025-12-04T11:12:46.8194146Z ASIC Revision: 1(0x1) 2025-12-04T11:12:46.8194289Z Cacheline Size: 128(0x80) 2025-12-04T11:12:46.8194433Z Max Clock Freq. (MHz): 2100 2025-12-04T11:12:46.8194573Z BDFID: 25856 2025-12-04T11:12:46.8194711Z Internal Node ID: 4 2025-12-04T11:12:46.8194904Z Compute Unit: 304 2025-12-04T11:12:46.8195043Z SIMDs per CU: 4 2025-12-04T11:12:46.8195186Z Shader Engines: 32 2025-12-04T11:12:46.8195332Z Shader Arrs. per Eng.: 1 2025-12-04T11:12:46.8195483Z WatchPts on Addr. Ranges:4 2025-12-04T11:12:46.8195633Z Coherent Host Access: FALSE 2025-12-04T11:12:46.8195768Z Memory Properties: 2025-12-04T11:12:46.8195877Z Features: KERNEL_DISPATCH 2025-12-04T11:12:46.8196008Z Fast F16 Operation: TRUE 2025-12-04T11:12:46.8196254Z Wavefront Size: 64(0x40) 2025-12-04T11:12:46.8196554Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8196742Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8196866Z x 1024(0x400) 2025-12-04T11:12:46.8196992Z y 1024(0x400) 2025-12-04T11:12:46.8197110Z z 1024(0x400) 2025-12-04T11:12:46.8197251Z Max Waves Per CU: 32(0x20) 2025-12-04T11:12:46.8197400Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:12:46.8197543Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8197677Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8197786Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8197907Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8198028Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8198165Z Max fbarriers/Workgrp: 32 2025-12-04T11:12:46.8198323Z Packet Processor uCode:: 185 2025-12-04T11:12:46.8198477Z SDMA engine uCode:: 24 2025-12-04T11:12:46.8198624Z IOMMU Support:: None 2025-12-04T11:12:46.8198751Z Pool Info: 2025-12-04T11:12:46.8198847Z Pool 1 2025-12-04T11:12:46.8198968Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:12:46.8199111Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8199250Z Allocatable: TRUE 2025-12-04T11:12:46.8199396Z Alloc Granule: 4KB 2025-12-04T11:12:46.8199549Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8199755Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8199903Z Accessible by all: FALSE 2025-12-04T11:12:46.8200080Z Pool 2 2025-12-04T11:12:46.8200200Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:12:46.8200339Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8200476Z Allocatable: TRUE 2025-12-04T11:12:46.8200618Z Alloc Granule: 4KB 2025-12-04T11:12:46.8200768Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8200921Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8201067Z Accessible by all: FALSE 2025-12-04T11:12:46.8201199Z Pool 3 2025-12-04T11:12:46.8201315Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:12:46.8201453Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8201594Z Allocatable: TRUE 2025-12-04T11:12:46.8201772Z Alloc Granule: 4KB 2025-12-04T11:12:46.8201925Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8202076Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8202222Z Accessible by all: FALSE 2025-12-04T11:12:46.8202349Z Pool 4 2025-12-04T11:12:46.8202461Z Segment: GROUP 2025-12-04T11:12:46.8202591Z Size: 64(0x40) KB 2025-12-04T11:12:46.8202727Z Allocatable: FALSE 2025-12-04T11:12:46.8202870Z Alloc Granule: 0KB 2025-12-04T11:12:46.8203021Z Alloc Recommended Granule:0KB 2025-12-04T11:12:46.8203175Z Alloc Alignment: 0KB 2025-12-04T11:12:46.8203323Z Accessible by all: FALSE 2025-12-04T11:12:46.8203453Z ISA Info: 2025-12-04T11:12:46.8203550Z ISA 1 2025-12-04T11:12:46.8203671Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:12:46.8203828Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8203977Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8204127Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8204280Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8204421Z Fast f16: TRUE 2025-12-04T11:12:46.8204563Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8204700Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8204825Z x 1024(0x400) 2025-12-04T11:12:46.8204946Z y 1024(0x400) 2025-12-04T11:12:46.8205066Z z 1024(0x400) 2025-12-04T11:12:46.8205196Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8205326Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8205436Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8205560Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8205682Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8205816Z FBarrier Max Size: 32 2025-12-04T11:12:46.8205942Z ISA 2 2025-12-04T11:12:46.8206070Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:12:46.8206268Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8206421Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8206568Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8206720Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8206861Z Fast f16: TRUE 2025-12-04T11:12:46.8207001Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8207134Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8207252Z x 1024(0x400) 2025-12-04T11:12:46.8207369Z y 1024(0x400) 2025-12-04T11:12:46.8207486Z z 1024(0x400) 2025-12-04T11:12:46.8207616Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8207747Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8207886Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8208007Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8208127Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8208262Z FBarrier Max Size: 32 2025-12-04T11:12:46.8208384Z ******* 2025-12-04T11:12:46.8208478Z Agent 6 2025-12-04T11:12:46.8208566Z ******* 2025-12-04T11:12:46.8208669Z Name: gfx942 2025-12-04T11:12:46.8208802Z Uuid: GPU-10f755404c07bc49 2025-12-04T11:12:46.8208942Z Marketing Name: AMD Instinct MI325X 2025-12-04T11:12:46.8209089Z Vendor Name: AMD 2025-12-04T11:12:46.8209234Z Feature: KERNEL_DISPATCH 2025-12-04T11:12:46.8209378Z Profile: BASE_PROFILE 2025-12-04T11:12:46.8209522Z Float Round Mode: NEAR 2025-12-04T11:12:46.8209664Z Max Queue Number: 128(0x80) 2025-12-04T11:12:46.8209925Z Queue Min Size: 64(0x40) 2025-12-04T11:12:46.8210064Z Queue Max Size: 131072(0x20000) 2025-12-04T11:12:46.8210200Z Queue Type: MULTI 2025-12-04T11:12:46.8210333Z Node: 5 2025-12-04T11:12:46.8210465Z Device Type: GPU 2025-12-04T11:12:46.8210587Z Cache Info: 2025-12-04T11:12:46.8210695Z L1: 32(0x20) KB 2025-12-04T11:12:46.8210817Z L2: 4096(0x1000) KB 2025-12-04T11:12:46.8210940Z L3: 262144(0x40000) KB 2025-12-04T11:12:46.8211064Z Chip ID: 29861(0x74a5) 2025-12-04T11:12:46.8211200Z ASIC Revision: 1(0x1) 2025-12-04T11:12:46.8211341Z Cacheline Size: 128(0x80) 2025-12-04T11:12:46.8211483Z Max Clock Freq. (MHz): 2100 2025-12-04T11:12:46.8211616Z BDFID: 5376 2025-12-04T11:12:46.8211754Z Internal Node ID: 5 2025-12-04T11:12:46.8211895Z Compute Unit: 304 2025-12-04T11:12:46.8212030Z SIMDs per CU: 4 2025-12-04T11:12:46.8212172Z Shader Engines: 32 2025-12-04T11:12:46.8212316Z Shader Arrs. per Eng.: 1 2025-12-04T11:12:46.8212508Z WatchPts on Addr. Ranges:4 2025-12-04T11:12:46.8212658Z Coherent Host Access: FALSE 2025-12-04T11:12:46.8212787Z Memory Properties: 2025-12-04T11:12:46.8212893Z Features: KERNEL_DISPATCH 2025-12-04T11:12:46.8213026Z Fast F16 Operation: TRUE 2025-12-04T11:12:46.8213170Z Wavefront Size: 64(0x40) 2025-12-04T11:12:46.8213315Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8213446Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8213561Z x 1024(0x400) 2025-12-04T11:12:46.8213680Z y 1024(0x400) 2025-12-04T11:12:46.8213794Z z 1024(0x400) 2025-12-04T11:12:46.8213924Z Max Waves Per CU: 32(0x20) 2025-12-04T11:12:46.8214110Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:12:46.8214252Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8214380Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8214486Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8214605Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8214725Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8214859Z Max fbarriers/Workgrp: 32 2025-12-04T11:12:46.8215016Z Packet Processor uCode:: 185 2025-12-04T11:12:46.8215166Z SDMA engine uCode:: 24 2025-12-04T11:12:46.8215310Z IOMMU Support:: None 2025-12-04T11:12:46.8215435Z Pool Info: 2025-12-04T11:12:46.8215531Z Pool 1 2025-12-04T11:12:46.8215654Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:12:46.8215800Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8215941Z Allocatable: TRUE 2025-12-04T11:12:46.8216087Z Alloc Granule: 4KB 2025-12-04T11:12:46.8216242Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8216393Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8216540Z Accessible by all: FALSE 2025-12-04T11:12:46.8216668Z Pool 2 2025-12-04T11:12:46.8216787Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:12:46.8216927Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8217063Z Allocatable: TRUE 2025-12-04T11:12:46.8217208Z Alloc Granule: 4KB 2025-12-04T11:12:46.8217361Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8217512Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8217661Z Accessible by all: FALSE 2025-12-04T11:12:46.8217788Z Pool 3 2025-12-04T11:12:46.8217904Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:12:46.8218041Z Size: 268419072(0xfffc000) KB 2025-12-04T11:12:46.8218177Z Allocatable: TRUE 2025-12-04T11:12:46.8218320Z Alloc Granule: 4KB 2025-12-04T11:12:46.8218471Z Alloc Recommended Granule:2048KB 2025-12-04T11:12:46.8218621Z Alloc Alignment: 4KB 2025-12-04T11:12:46.8218798Z Accessible by all: FALSE 2025-12-04T11:12:46.8218925Z Pool 4 2025-12-04T11:12:46.8219037Z Segment: GROUP 2025-12-04T11:12:46.8219169Z Size: 64(0x40) KB 2025-12-04T11:12:46.8219302Z Allocatable: FALSE 2025-12-04T11:12:46.8219447Z Alloc Granule: 0KB 2025-12-04T11:12:46.8219600Z Alloc Recommended Granule:0KB 2025-12-04T11:12:46.8219788Z Alloc Alignment: 0KB 2025-12-04T11:12:46.8219935Z Accessible by all: FALSE 2025-12-04T11:12:46.8220063Z ISA Info: 2025-12-04T11:12:46.8220156Z ISA 1 2025-12-04T11:12:46.8220276Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:12:46.8220468Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8220616Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8220764Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8220916Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8221055Z Fast f16: TRUE 2025-12-04T11:12:46.8221195Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8221328Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8221447Z x 1024(0x400) 2025-12-04T11:12:46.8221567Z y 1024(0x400) 2025-12-04T11:12:46.8221687Z z 1024(0x400) 2025-12-04T11:12:46.8221817Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8221948Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8222060Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8222184Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8222304Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8222439Z FBarrier Max Size: 32 2025-12-04T11:12:46.8222565Z ISA 2 2025-12-04T11:12:46.8222691Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:12:46.8222853Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:12:46.8223005Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:12:46.8223152Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8223315Z Default Rounding Mode: NEAR 2025-12-04T11:12:46.8223462Z Fast f16: TRUE 2025-12-04T11:12:46.8223602Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:12:46.8223737Z Workgroup Max Size per Dimension: 2025-12-04T11:12:46.8223855Z x 1024(0x400) 2025-12-04T11:12:46.8223973Z y 1024(0x400) 2025-12-04T11:12:46.8224090Z z 1024(0x400) 2025-12-04T11:12:46.8224220Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:12:46.8224349Z Grid Max Size per Dimension: 2025-12-04T11:12:46.8224459Z x 4294967295(0xffffffff) 2025-12-04T11:12:46.8224580Z y 4294967295(0xffffffff) 2025-12-04T11:12:46.8224700Z z 4294967295(0xffffffff) 2025-12-04T11:12:46.8224872Z FBarrier Max Size: 32 2025-12-04T11:12:46.8224996Z *** Done *** 2025-12-04T11:12:46.8234302Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-12-04T11:12:46.8234478Z ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-12-04T11:12:46.8234752Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-12-04T11:12:46.8235006Z if [[ $ngpu -eq 0 ]]; then 2025-12-04T11:12:46.8235150Z  echo "Error: Failed to detect any GPUs on the runner" 2025-12-04T11:12:46.8235290Z  echo "$msg" 2025-12-04T11:12:46.8235389Z  exit 1 2025-12-04T11:12:46.8235479Z fi 2025-12-04T11:12:46.8238253Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.8238392Z env: 2025-12-04T11:12:46.8238478Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.8238577Z ##[endgroup] 2025-12-04T11:12:46.9308334Z ##[group]Run pytorch/pytorch/.github/actions/diskspace-cleanup@main 2025-12-04T11:12:46.9308579Z with: 2025-12-04T11:12:46.9308715Z diskspace-cutoff: 70 2025-12-04T11:12:46.9308852Z env: 2025-12-04T11:12:46.9308981Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.9309117Z ##[endgroup] 2025-12-04T11:12:46.9333935Z ##[group]Run set -ex 2025-12-04T11:12:46.9334077Z set -ex 2025-12-04T11:12:46.9334183Z diskspace_cutoff=70 2025-12-04T11:12:46.9334330Z docker_root_dir=$(docker info -f '{{.DockerRootDir}}') 2025-12-04T11:12:46.9334499Z if [ ! -d "$docker_root_dir" ]; then 2025-12-04T11:12:46.9334704Z  echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check." 2025-12-04T11:12:46.9334891Z  exit 0 2025-12-04T11:12:46.9334990Z fi 2025-12-04T11:12:46.9335157Z diskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-12-04T11:12:46.9335500Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-12-04T11:12:46.9335784Z if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then 2025-12-04T11:12:46.9335931Z  docker system prune -af 2025-12-04T11:12:46.9336127Z  diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-12-04T11:12:46.9336346Z  if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then 2025-12-04T11:12:46.9336509Z  diskspace_cutoff_int=$((diskspace_cutoff + 0)) 2025-12-04T11:12:46.9336673Z  difference=$((100 - diskspace_cutoff_int)) 2025-12-04T11:12:46.9336889Z  echo "Error: Available diskspace is less than $difference percent. Not enough diskspace." 2025-12-04T11:12:46.9337079Z  echo "$msg" 2025-12-04T11:12:46.9337190Z  exit 1 2025-12-04T11:12:46.9337295Z  else 2025-12-04T11:12:46.9337415Z  difference=$((diskspace - diskspace_new)) 2025-12-04T11:12:46.9337575Z  echo "Diskspace saved: $difference percent" 2025-12-04T11:12:46.9337705Z  fi 2025-12-04T11:12:46.9337798Z fi 2025-12-04T11:12:46.9342015Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.9342156Z env: 2025-12-04T11:12:46.9342252Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.9342356Z ##[endgroup] 2025-12-04T11:12:46.9354646Z + diskspace_cutoff=70 2025-12-04T11:12:46.9357126Z ++ docker info -f '{{.DockerRootDir}}' 2025-12-04T11:12:46.9680858Z + docker_root_dir=/home/runner/docker-data 2025-12-04T11:12:46.9681013Z + '[' '!' -d /home/runner/docker-data ']' 2025-12-04T11:12:46.9687357Z ++ df -H --output=pcent /home/runner/docker-data 2025-12-04T11:12:46.9687801Z ++ sed -n 2p 2025-12-04T11:12:46.9688019Z ++ sed s/%// 2025-12-04T11:12:46.9688470Z ++ sed 's/ //' 2025-12-04T11:12:46.9699655Z + diskspace=' 3' 2025-12-04T11:12:46.9700222Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified' 2025-12-04T11:12:46.9700691Z + [[ 3 -ge 70 ]] 2025-12-04T11:12:46.9728668Z ##[group]Run RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-12-04T11:12:46.9728903Z RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-12-04T11:12:46.9729090Z rm -rf "${RUNNER_ARTIFACT_DIR}" 2025-12-04T11:12:46.9729251Z mkdir -p "${RUNNER_ARTIFACT_DIR}" 2025-12-04T11:12:46.9729448Z echo "RUNNER_ARTIFACT_DIR=${RUNNER_ARTIFACT_DIR}" >> "${GITHUB_ENV}" 2025-12-04T11:12:46.9729635Z  2025-12-04T11:12:46.9729816Z RUNNER_TEST_RESULTS_DIR="${RUNNER_TEMP}/test-results" 2025-12-04T11:12:46.9729996Z rm -rf "${RUNNER_TEST_RESULTS_DIR}" 2025-12-04T11:12:46.9730161Z mkdir -p "${RUNNER_TEST_RESULTS_DIR}" 2025-12-04T11:12:46.9730371Z echo "RUNNER_TEST_RESULTS_DIR=${RUNNER_TEST_RESULTS_DIR}" >> "${GITHUB_ENV}" 2025-12-04T11:12:46.9730577Z  2025-12-04T11:12:46.9730888Z RUNNER_DOCS_DIR="${RUNNER_TEMP}/docs" 2025-12-04T11:12:46.9731040Z rm -rf "${RUNNER_DOCS_DIR}" 2025-12-04T11:12:46.9731186Z mkdir -p "${RUNNER_DOCS_DIR}" 2025-12-04T11:12:46.9731364Z echo "RUNNER_DOCS_DIR=${RUNNER_DOCS_DIR}" >> "${GITHUB_ENV}" 2025-12-04T11:12:46.9734834Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.9734998Z env: 2025-12-04T11:12:46.9735097Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.9735216Z ##[endgroup] 2025-12-04T11:12:46.9805741Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:12:46.9805982Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:12:46.9806169Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:12:46.9810543Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.9810702Z env: 2025-12-04T11:12:46.9810816Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.9810954Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:46.9811124Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:46.9811292Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:46.9811423Z ##[endgroup] 2025-12-04T11:12:46.9873634Z ##[group]Run # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-12-04T11:12:46.9874031Z # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-12-04T11:12:46.9874285Z # Add render group for container creation. 2025-12-04T11:12:46.9874502Z render_gid=`cat /etc/group | grep render | cut -d: -f3` 2025-12-04T11:12:46.9874760Z # Ensure GPU isolation if pod is part of kubernetes setup with DEVICE_FLAG. 2025-12-04T11:12:46.9875006Z if [ -f "/etc/podinfo/gha-render-devices" ]; then 2025-12-04T11:12:46.9875251Z  DEVICE_FLAG=$(cat /etc/podinfo/gha-render-devices) 2025-12-04T11:12:46.9875427Z else 2025-12-04T11:12:46.9875558Z  DEVICE_FLAG="--device /dev/dri" 2025-12-04T11:12:46.9875708Z fi 2025-12-04T11:12:46.9875927Z # The --group-add daemon and --group-add bin are needed in the Ubuntu 24.04 and Almalinux OSs respectively. 2025-12-04T11:12:46.9876276Z # This is due to the device files (/dev/kfd & /dev/dri) being owned by video group on bare metal. 2025-12-04T11:12:46.9876594Z # This video group ID maps to subgid 1 inside the docker image due to the /etc/subgid entries. 2025-12-04T11:12:46.9876933Z # The group name corresponding to group ID 1 can change depending on the OS, so both are necessary. 2025-12-04T11:12:46.9877483Z echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd $DEVICE_FLAG --group-add video --group-add $render_gid --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host" >> "${GITHUB_ENV}" 2025-12-04T11:12:46.9882512Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:46.9882662Z env: 2025-12-04T11:12:46.9882756Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.9882899Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:46.9883088Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:46.9896790Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:46.9896927Z ##[endgroup] 2025-12-04T11:12:46.9963817Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 2025-12-04T11:12:46.9964020Z with: 2025-12-04T11:12:46.9964170Z role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only 2025-12-04T11:12:46.9964342Z aws-region: us-east-1 2025-12-04T11:12:46.9964456Z role-duration-seconds: 18000 2025-12-04T11:12:46.9964579Z audience: sts.amazonaws.com 2025-12-04T11:12:46.9964703Z env: 2025-12-04T11:12:46.9964798Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:46.9965061Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:46.9965236Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:46.9965402Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:46.9965901Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:46.9966373Z ##[endgroup] 2025-12-04T11:12:47.2779104Z Assuming role with OIDC 2025-12-04T11:12:47.6138318Z Authenticated as assumedRoleId AROAUPVRELQNLLCOPFEJR:GitHubActions 2025-12-04T11:12:47.7093615Z ##[group]Run aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 2025-12-04T11:12:47.7093859Z with: 2025-12-04T11:12:47.7093974Z mask-password: true 2025-12-04T11:12:47.7094122Z registry-type: private 2025-12-04T11:12:47.7094250Z skip-logout: false 2025-12-04T11:12:47.7094363Z env: 2025-12-04T11:12:47.7094471Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:47.7094631Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:47.7094831Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:47.7095020Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:47.7095589Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:47.7096143Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:47.7096283Z AWS_REGION: us-east-1 2025-12-04T11:12:47.7096749Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:47.7096927Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:47.7099191Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:47.7099301Z ##[endgroup] 2025-12-04T11:12:48.1308441Z Logging into registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:48.7622186Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:12:48.7622443Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:12:48.7622642Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:12:48.7622836Z env | grep '^RUNNER' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:12:48.7627036Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:48.7627186Z env: 2025-12-04T11:12:48.7627290Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:48.7627433Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:48.7627618Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:48.7627927Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:48.7628451Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:48.7628945Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:48.7629070Z AWS_REGION: us-east-1 2025-12-04T11:12:48.7629309Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:48.7629469Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:48.7631692Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:48.7631802Z ##[endgroup] 2025-12-04T11:12:48.7726765Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-12-04T11:12:48.7726985Z ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-12-04T11:12:48.7727235Z if [[ $ngpu -lt 2 ]]; then #We are temporarily reducing this down to 2 from 4 so that we can run tests on nodes with less gpus. 2025-12-04T11:12:48.7727540Z  echo "Error: only $ngpu GPU(s) detected, at least 2 GPUs are needed for distributed jobs" 2025-12-04T11:12:48.7727730Z  exit 1 2025-12-04T11:12:48.7727830Z fi 2025-12-04T11:12:48.7732174Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:48.7732325Z env: 2025-12-04T11:12:48.7732430Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:48.7732572Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:48.7732755Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:48.7732927Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:48.7733460Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:48.7733959Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:48.7734088Z AWS_REGION: us-east-1 2025-12-04T11:12:48.7734339Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:48.7734503Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:48.7736640Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:48.7736755Z ##[endgroup] 2025-12-04T11:12:48.8767152Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T11:12:48.8767336Z with: 2025-12-04T11:12:48.8767614Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8767923Z use-custom-docker-registry: true 2025-12-04T11:12:48.8768055Z docker-build-dir: .ci/docker 2025-12-04T11:12:48.8768182Z docker-build-script: ./build.sh 2025-12-04T11:12:48.8768308Z working-directory: . 2025-12-04T11:12:48.8768455Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:48.8768619Z force-push: false 2025-12-04T11:12:48.8768723Z env: 2025-12-04T11:12:48.8768821Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:48.8768965Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:48.8769144Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:48.8769331Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:48.8769900Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:48.8770390Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:48.8770510Z AWS_REGION: us-east-1 2025-12-04T11:12:48.8770653Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:48.8770814Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:48.8773056Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:48.8773167Z ##[endgroup] 2025-12-04T11:12:48.8781708Z ##[group]Run set -ex 2025-12-04T11:12:48.8781838Z set -ex 2025-12-04T11:12:48.8781937Z  2025-12-04T11:12:48.8782101Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T11:12:48.8782354Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T11:12:48.8782568Z # job could then download the pre-built image as usual 2025-12-04T11:12:48.8782826Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T11:12:48.8783068Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8783204Z else 2025-12-04T11:12:48.8783319Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8783496Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8783659Z  2025-12-04T11:12:48.8783870Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T11:12:48.8784105Z  exit 0 2025-12-04T11:12:48.8784203Z fi 2025-12-04T11:12:48.8784300Z  2025-12-04T11:12:48.8784443Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T11:12:48.8784673Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T11:12:48.8784878Z  # use it as it is, but first let's extract the tag 2025-12-04T11:12:48.8785067Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T11:12:48.8785266Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8785454Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8785615Z else 2025-12-04T11:12:48.8785732Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T11:12:48.8785887Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T11:12:48.8786045Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T11:12:48.8786178Z  fi 2025-12-04T11:12:48.8786437Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T11:12:48.8786670Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8786915Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8787175Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8787341Z fi 2025-12-04T11:12:48.8789922Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:48.8790073Z env: 2025-12-04T11:12:48.8790177Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:48.8790320Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:48.8790503Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:48.8790675Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:48.8791181Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:48.8791669Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:48.8791794Z AWS_REGION: us-east-1 2025-12-04T11:12:48.8791939Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:48.8792100Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:48.8794223Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:48.8794336Z REPO_NAME: pytorch 2025-12-04T11:12:48.8794618Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8794959Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T11:12:48.8795085Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T11:12:48.8795247Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:48.8795415Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T11:12:48.8795543Z CUSTOM_TAG_PREFIX: 2025-12-04T11:12:48.8795653Z ##[endgroup] 2025-12-04T11:12:48.8813819Z + [[ -d .ci/docker ]] 2025-12-04T11:12:48.8813969Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T11:12:48.8814096Z + [[ true == \t\r\u\e ]] 2025-12-04T11:12:48.8814204Z + echo skip=false 2025-12-04T11:12:48.8815634Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T11:12:48.8820787Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8821408Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T11:12:48.8832112Z + DOCKER_TAG=pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8832446Z + echo docker-tag=pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8832840Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8873474Z ##[group]Run set +e 2025-12-04T11:12:48.8873634Z set +e 2025-12-04T11:12:48.8873733Z set -x 2025-12-04T11:12:48.8873829Z  2025-12-04T11:12:48.8873920Z login() { 2025-12-04T11:12:48.8874121Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T11:12:48.8874321Z } 2025-12-04T11:12:48.8874428Z  2025-12-04T11:12:48.8874518Z retry () { 2025-12-04T11:12:48.8874634Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T11:12:48.8874765Z } 2025-12-04T11:12:48.8874852Z  2025-12-04T11:12:48.8874951Z retry login "${DOCKER_REGISTRY}" 2025-12-04T11:12:48.8875074Z  2025-12-04T11:12:48.8875297Z START_TIME=$(date +%s) 2025-12-04T11:12:48.8875425Z # Wait up to 120 minutes 2025-12-04T11:12:48.8875576Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T11:12:48.8875766Z  # Check if image already exists, if it does then skip building it 2025-12-04T11:12:48.8875959Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T11:12:48.8876106Z  exit 0 2025-12-04T11:12:48.8876206Z  fi 2025-12-04T11:12:48.8876300Z  2025-12-04T11:12:48.8876453Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T11:12:48.8876704Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T11:12:48.8876951Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T11:12:48.8877151Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T11:12:48.8877317Z  # It's a Docker build job, let's build the image 2025-12-04T11:12:48.8877454Z  break 2025-12-04T11:12:48.8877548Z  else 2025-12-04T11:12:48.8877682Z  # It's a regular build job, wait for the image to become available 2025-12-04T11:12:48.8877836Z  sleep 300 2025-12-04T11:12:48.8877937Z  fi 2025-12-04T11:12:48.8878026Z done 2025-12-04T11:12:48.8878115Z  2025-12-04T11:12:48.8878256Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T11:12:48.8878468Z # be empty. The default action would be to continue rebuild the image 2025-12-04T11:12:48.8879996Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T11:12:48.8880167Z  # if we're on the base branch then use the parent commit 2025-12-04T11:12:48.8880323Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T11:12:48.8880456Z else 2025-12-04T11:12:48.8880591Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T11:12:48.8880775Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T11:12:48.8880916Z fi 2025-12-04T11:12:48.8881008Z  2025-12-04T11:12:48.8881108Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T11:12:48.8881253Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8881386Z  2025-12-04T11:12:48.8881560Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T11:12:48.8881769Z  exit 0 2025-12-04T11:12:48.8881861Z fi 2025-12-04T11:12:48.8881953Z  2025-12-04T11:12:48.8882080Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T11:12:48.8882331Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T11:12:48.8882545Z  exit 1 2025-12-04T11:12:48.8882641Z fi 2025-12-04T11:12:48.8882727Z  2025-12-04T11:12:48.8882867Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T11:12:48.8883106Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T11:12:48.8883320Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T11:12:48.8883568Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T11:12:48.8883843Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T11:12:48.8884013Z fi 2025-12-04T11:12:48.8884102Z  2025-12-04T11:12:48.8884211Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T11:12:48.8888255Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:48.8888448Z env: 2025-12-04T11:12:48.8888546Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:48.8888681Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:48.8888854Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:48.8889018Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:48.8889516Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:48.8890046Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:48.8890167Z AWS_REGION: us-east-1 2025-12-04T11:12:48.8890404Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:48.8890558Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:48.8892684Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:48.8892803Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T11:12:48.8892943Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:12:48.8893261Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8893622Z DOCKER_TAG: pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:48.8893856Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:48.8894012Z DOCKER_PUSH: 2025-12-04T11:12:48.8894110Z ##[endgroup] 2025-12-04T11:12:48.8908692Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:48.8908951Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:48.8910666Z + aws ecr get-login-password --region us-east-1 2025-12-04T11:12:48.8911107Z /home/runner/_work/_temp/377a28ba-4a75-44bd-ad6a-734483655b15.sh: line 5: aws: command not found 2025-12-04T11:12:48.8911721Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:48.9004403Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T11:12:48.9013177Z + sleep 1 2025-12-04T11:12:49.9024132Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:49.9029354Z + aws ecr get-login-password --region us-east-1 2025-12-04T11:12:49.9030086Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:49.9030820Z /home/runner/_work/_temp/377a28ba-4a75-44bd-ad6a-734483655b15.sh: line 5: aws: command not found 2025-12-04T11:12:49.9120512Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T11:12:49.9133786Z + sleep 2 2025-12-04T11:12:51.9149666Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:51.9153449Z + aws ecr get-login-password --region us-east-1 2025-12-04T11:12:51.9153702Z /home/runner/_work/_temp/377a28ba-4a75-44bd-ad6a-734483655b15.sh: line 5: aws: command not found 2025-12-04T11:12:51.9156719Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:51.9240071Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T11:12:51.9254058Z ++ date +%s 2025-12-04T11:12:51.9262864Z + START_TIME=1764846771 2025-12-04T11:12:51.9267020Z ++ date +%s 2025-12-04T11:12:51.9276210Z + [[ 1764839571 -lt 1764846771 ]] 2025-12-04T11:12:51.9276718Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:53.4641198Z { 2025-12-04T11:12:53.4641734Z "schemaVersion": 2, 2025-12-04T11:12:53.4642320Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T11:12:53.4642818Z "config": { 2025-12-04T11:12:53.4643203Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T11:12:53.4643636Z "size": 30522, 2025-12-04T11:12:53.4644088Z "digest": "sha256:79498ef00fdf8abfcde955fd685c3a7412c33ca80383b5905abfdc3c70621215" 2025-12-04T11:12:53.4645286Z }, 2025-12-04T11:12:53.4645511Z "layers": [ 2025-12-04T11:12:53.4645748Z { 2025-12-04T11:12:53.4646110Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4646528Z "size": 30594402, 2025-12-04T11:12:53.4646947Z "digest": "sha256:02de03a7213b62b792ec66a7efb8c86c4117ca00fb8651facf8ecfe33044b485" 2025-12-04T11:12:53.4647296Z }, 2025-12-04T11:12:53.4647450Z { 2025-12-04T11:12:53.4647703Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4648007Z "size": 1554, 2025-12-04T11:12:53.4648316Z "digest": "sha256:3a5718b5258e28918133dd74ea64bd506b2c15530a2fa8a72c45c5b0d8f7c7b0" 2025-12-04T11:12:53.4648664Z }, 2025-12-04T11:12:53.4648816Z { 2025-12-04T11:12:53.4649060Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4649368Z "size": 335779211, 2025-12-04T11:12:53.4649762Z "digest": "sha256:bf3aa22776924a41b55849f0f30cb22af45d41da1177a9d682cf94cde99d8f98" 2025-12-04T11:12:53.4650112Z }, 2025-12-04T11:12:53.4650265Z { 2025-12-04T11:12:53.4650515Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4650816Z "size": 704, 2025-12-04T11:12:53.4651122Z "digest": "sha256:9d58e5257cefd43e8226153d71d28a865253662146aa9fce9a9f95af67b497fa" 2025-12-04T11:12:53.4651451Z }, 2025-12-04T11:12:53.4651602Z { 2025-12-04T11:12:53.4651850Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4652155Z "size": 1770, 2025-12-04T11:12:53.4652465Z "digest": "sha256:fde80a64553533a56c032d4bc388837e7d4631a0424d1bfe135703165b67fd4d" 2025-12-04T11:12:53.4652951Z }, 2025-12-04T11:12:53.4653099Z { 2025-12-04T11:12:53.4653343Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4653646Z "size": 485, 2025-12-04T11:12:53.4653956Z "digest": "sha256:6931c5f20e80e481e4f484471ff3a02878b4f8c54a9a5a4717213fdaa35c0bff" 2025-12-04T11:12:53.4654298Z }, 2025-12-04T11:12:53.4654457Z { 2025-12-04T11:12:53.4654707Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4655012Z "size": 120663474, 2025-12-04T11:12:53.4655335Z "digest": "sha256:170ea6d3edd62991e37d2e6ebe53dfcd4601f5d42e8f9720af5f8db5fc267856" 2025-12-04T11:12:53.4655684Z }, 2025-12-04T11:12:53.4655835Z { 2025-12-04T11:12:53.4656081Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4656381Z "size": 4433, 2025-12-04T11:12:53.4656667Z "digest": "sha256:dc8487f6c81cac00fa33031f8d3481e2c3634c4f064a9c4c36b87b41e78bc9fb" 2025-12-04T11:12:53.4656946Z }, 2025-12-04T11:12:53.4657058Z { 2025-12-04T11:12:53.4657234Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4657451Z "size": 1755, 2025-12-04T11:12:53.4657674Z "digest": "sha256:9748c5348f39a11c960c49fd9219fdea1c23e612ed11a02d71501424defc80f5" 2025-12-04T11:12:53.4657911Z }, 2025-12-04T11:12:53.4658020Z { 2025-12-04T11:12:53.4658203Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4658419Z "size": 724, 2025-12-04T11:12:53.4658648Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T11:12:53.4658898Z }, 2025-12-04T11:12:53.4659009Z { 2025-12-04T11:12:53.4659186Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4659406Z "size": 3378352584, 2025-12-04T11:12:53.4659643Z "digest": "sha256:af88f886884fe6f1a1992efb7ce8473901f795eef69caa199443f3e076fdfd5b" 2025-12-04T11:12:53.4659946Z }, 2025-12-04T11:12:53.4660057Z { 2025-12-04T11:12:53.4660235Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4660450Z "size": 396, 2025-12-04T11:12:53.4660670Z "digest": "sha256:32fbb88555c4195c45c7008cf92e389d67acc79a7e382503003ef93bcb886afe" 2025-12-04T11:12:53.4660915Z }, 2025-12-04T11:12:53.4661024Z { 2025-12-04T11:12:53.4661384Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4661602Z "size": 80171601, 2025-12-04T11:12:53.4661831Z "digest": "sha256:3231e1ab814b143b244037c540b637be259085834865ac43b1ed2b6f6ad631e1" 2025-12-04T11:12:53.4662071Z }, 2025-12-04T11:12:53.4662184Z { 2025-12-04T11:12:53.4662358Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4662573Z "size": 787, 2025-12-04T11:12:53.4662802Z "digest": "sha256:80061bf5dcbb9a4e38ac865a9cdc0a615bb294e3e6bfa357a6d515dcf3f54abc" 2025-12-04T11:12:53.4663054Z }, 2025-12-04T11:12:53.4663165Z { 2025-12-04T11:12:53.4663382Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4663785Z "size": 106, 2025-12-04T11:12:53.4664091Z "digest": "sha256:6e9524f4518ec02b47ff12c55b6b6afbc65b3f4be59072e2afe20c2c87522549" 2025-12-04T11:12:53.4664340Z }, 2025-12-04T11:12:53.4664449Z { 2025-12-04T11:12:53.4664632Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4664855Z "size": 1495, 2025-12-04T11:12:53.4665106Z "digest": "sha256:ce919d4bf5eeff71d49b160a16603117225530497c3905e02224227d11e2ff88" 2025-12-04T11:12:53.4665344Z }, 2025-12-04T11:12:53.4665458Z { 2025-12-04T11:12:53.4665633Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4665853Z "size": 548601195, 2025-12-04T11:12:53.4666084Z "digest": "sha256:47681e3e6f37423139a5c86549ffbb43e4f258344b0461208f5821263da152e9" 2025-12-04T11:12:53.4666322Z }, 2025-12-04T11:12:53.4666434Z { 2025-12-04T11:12:53.4666593Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4666847Z "size": 162, 2025-12-04T11:12:53.4667027Z "digest": "sha256:cb70fe22c9ebacebfe8402519059c8a66da6d5a77979e4c0ecdb3a762bebe357" 2025-12-04T11:12:53.4667225Z }, 2025-12-04T11:12:53.4667312Z { 2025-12-04T11:12:53.4667453Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4667630Z "size": 104, 2025-12-04T11:12:53.4667804Z "digest": "sha256:17858e829c8cfe9a7e22516e03ad5273d8cf5c50f58edb10ff60c74e15c8e1f6" 2025-12-04T11:12:53.4668002Z }, 2025-12-04T11:12:53.4668090Z { 2025-12-04T11:12:53.4668229Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4668398Z "size": 724, 2025-12-04T11:12:53.4668570Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T11:12:53.4668761Z }, 2025-12-04T11:12:53.4668849Z { 2025-12-04T11:12:53.4668984Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4669155Z "size": 196, 2025-12-04T11:12:53.4669329Z "digest": "sha256:a63f3b4eed1157bcb3c51b64196e74e9f10d1f923652b02fd433c6ed993597ff" 2025-12-04T11:12:53.4669521Z }, 2025-12-04T11:12:53.4669606Z { 2025-12-04T11:12:53.4669785Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4669954Z "size": 2584, 2025-12-04T11:12:53.4670138Z "digest": "sha256:10ab3d1afbc4cb2d3ced8f3e0072c0b1dd124dcadcf68b95fadf8a7a9f663860" 2025-12-04T11:12:53.4670337Z }, 2025-12-04T11:12:53.4670422Z { 2025-12-04T11:12:53.4670561Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4670733Z "size": 7652105336, 2025-12-04T11:12:53.4670912Z "digest": "sha256:98ca88b5095b449a2f2d753a21217856271912fbe51c2d99f928a2196f4097d5" 2025-12-04T11:12:53.4671100Z }, 2025-12-04T11:12:53.4671186Z { 2025-12-04T11:12:53.4671326Z + exit 0 2025-12-04T11:12:53.4671470Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4671644Z "size": 135, 2025-12-04T11:12:53.4671814Z "digest": "sha256:025c90839a58c768b3cc444e48cae67c1a5b2c85320ad8827231f0ba390cf9aa" 2025-12-04T11:12:53.4672003Z }, 2025-12-04T11:12:53.4672091Z { 2025-12-04T11:12:53.4672232Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4672416Z "size": 104, 2025-12-04T11:12:53.4672635Z "digest": "sha256:9255df5942ae69fee24f8074314f451d5d2f1ca71b6c777274297fd43a0032d8" 2025-12-04T11:12:53.4672827Z }, 2025-12-04T11:12:53.4672912Z { 2025-12-04T11:12:53.4673051Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4673219Z "size": 612, 2025-12-04T11:12:53.4673395Z "digest": "sha256:f71ca9d4ed1c4ca8177602f3cb0db83d9787ea6c258a8ef203387b308ff3e0f0" 2025-12-04T11:12:53.4673586Z }, 2025-12-04T11:12:53.4673672Z { 2025-12-04T11:12:53.4673810Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4673976Z "size": 838191953, 2025-12-04T11:12:53.4674159Z "digest": "sha256:d02b47b56ca7f3598f5943d4fdc7139d5e3d3bc82d49185cedf9817dd55fc75c" 2025-12-04T11:12:53.4674350Z }, 2025-12-04T11:12:53.4674437Z { 2025-12-04T11:12:53.4674577Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4674745Z "size": 111, 2025-12-04T11:12:53.4674920Z "digest": "sha256:40279492aea7bc8fb650842b495912195621c21b14cef4c717a9e0a9fc535131" 2025-12-04T11:12:53.4675106Z }, 2025-12-04T11:12:53.4675191Z { 2025-12-04T11:12:53.4675329Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4675497Z "size": 1556, 2025-12-04T11:12:53.4675673Z "digest": "sha256:33a27ce74abd7e32a03a564fc45005bc75904b53ad516f18d47facbeb2f2794e" 2025-12-04T11:12:53.4675866Z }, 2025-12-04T11:12:53.4675951Z { 2025-12-04T11:12:53.4676091Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4676259Z "size": 107, 2025-12-04T11:12:53.4676437Z "digest": "sha256:6b66ed335d1d8df6140caba76d9c2babed83bb37962e1e638825d49e67184fa5" 2025-12-04T11:12:53.4676655Z }, 2025-12-04T11:12:53.4676734Z { 2025-12-04T11:12:53.4676858Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4677010Z "size": 166, 2025-12-04T11:12:53.4677166Z "digest": "sha256:9f010fa04118bfee2d7b4481e6badb714032bde0652b04151a6599e57e1bd91b" 2025-12-04T11:12:53.4677343Z }, 2025-12-04T11:12:53.4677421Z { 2025-12-04T11:12:53.4677545Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4677697Z "size": 3702493, 2025-12-04T11:12:53.4677859Z "digest": "sha256:6c64d5e8bb6ae6ef4e3f1d316429d8b14a6e8a1fb410fb83b96c8bbd4a0a095c" 2025-12-04T11:12:53.4678033Z }, 2025-12-04T11:12:53.4678110Z { 2025-12-04T11:12:53.4678234Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4678387Z "size": 107, 2025-12-04T11:12:53.4678544Z "digest": "sha256:c20ea058f549f5f5538c95c5e0da23afbbc9fb7ffc1987d126fe684eeed743f5" 2025-12-04T11:12:53.4678725Z }, 2025-12-04T11:12:53.4678800Z { 2025-12-04T11:12:53.4678924Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4679077Z "size": 829, 2025-12-04T11:12:53.4679234Z "digest": "sha256:3c4fd2d54638a1336d39769fe36041aa6d186a8dea0e7096b8d8a7068ba0d3c0" 2025-12-04T11:12:53.4679405Z }, 2025-12-04T11:12:53.4679488Z { 2025-12-04T11:12:53.4679614Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4679806Z "size": 26673844, 2025-12-04T11:12:53.4679968Z "digest": "sha256:964ebac3d7a95c64ea7f0d828cd58e6244cc955e9a099a2525079ecf64026e3f" 2025-12-04T11:12:53.4680142Z }, 2025-12-04T11:12:53.4680219Z { 2025-12-04T11:12:53.4680345Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4680498Z "size": 104, 2025-12-04T11:12:53.4680657Z "digest": "sha256:2aaa7210673fc5bd15d36e54ee5c3fb495d1eafa1cb8d686054ccedb1c37bfc8" 2025-12-04T11:12:53.4680837Z }, 2025-12-04T11:12:53.4680916Z { 2025-12-04T11:12:53.4681040Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4681192Z "size": 424, 2025-12-04T11:12:53.4681348Z "digest": "sha256:fa273daa00371a98ed668535e14b8cc3cb425feba0b601b3e3c72314d0234312" 2025-12-04T11:12:53.4681521Z }, 2025-12-04T11:12:53.4681598Z { 2025-12-04T11:12:53.4681767Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4681921Z "size": 19279582, 2025-12-04T11:12:53.4682086Z "digest": "sha256:d931a62fd2408369decfa0e6eac11768e35d0ffddee87d769c82aaf1ad7e2899" 2025-12-04T11:12:53.4682260Z }, 2025-12-04T11:12:53.4682338Z { 2025-12-04T11:12:53.4682463Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4682616Z "size": 826, 2025-12-04T11:12:53.4682768Z "digest": "sha256:d3573d61c28e1400840260d3c2c786c9e104f6558162beac799e55b6f5c1e747" 2025-12-04T11:12:53.4682937Z }, 2025-12-04T11:12:53.4683017Z { 2025-12-04T11:12:53.4683143Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4683295Z "size": 724, 2025-12-04T11:12:53.4683450Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T11:12:53.4683621Z }, 2025-12-04T11:12:53.4683697Z { 2025-12-04T11:12:53.4683826Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4683977Z "size": 149, 2025-12-04T11:12:53.4684132Z "digest": "sha256:f9b32f08c49055dd61bd359d5f42f6adb9e5a183c2821d97d11572dd7ce1e91f" 2025-12-04T11:12:53.4684306Z }, 2025-12-04T11:12:53.4684384Z { 2025-12-04T11:12:53.4684508Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4684659Z "size": 136, 2025-12-04T11:12:53.4684812Z "digest": "sha256:3a0206399d60f6e8897f78c8e8f81b59d51969a329ef45485d28ae19607ca72c" 2025-12-04T11:12:53.4684982Z }, 2025-12-04T11:12:53.4685060Z { 2025-12-04T11:12:53.4685186Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4685375Z "size": 140, 2025-12-04T11:12:53.4685531Z "digest": "sha256:386f322edd1c1c275126bab065c22fcd3950916c1fb8491a21a7f5c358af599a" 2025-12-04T11:12:53.4685701Z }, 2025-12-04T11:12:53.4685780Z { 2025-12-04T11:12:53.4685909Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4686067Z "size": 32, 2025-12-04T11:12:53.4686225Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T11:12:53.4686398Z }, 2025-12-04T11:12:53.4686475Z { 2025-12-04T11:12:53.4686600Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4686754Z "size": 223, 2025-12-04T11:12:53.4686909Z "digest": "sha256:bbe49df30697f6959cd958299909d9255cd54663ce2e9e2c2d378f8f9dfe8345" 2025-12-04T11:12:53.4687080Z }, 2025-12-04T11:12:53.4687157Z { 2025-12-04T11:12:53.4687281Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4687435Z "size": 346, 2025-12-04T11:12:53.4687592Z "digest": "sha256:d6630aa6f375b12cb7471c5b60eb32e02ff8d70adf4497e061d6c15fead186c7" 2025-12-04T11:12:53.4687765Z }, 2025-12-04T11:12:53.4687842Z { 2025-12-04T11:12:53.4687967Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4688117Z "size": 88302, 2025-12-04T11:12:53.4688280Z "digest": "sha256:6d807afc1309592c99c7d77af3874afb54c1718377fe721ac0cc616f59d291b9" 2025-12-04T11:12:53.4688452Z }, 2025-12-04T11:12:53.4688531Z { 2025-12-04T11:12:53.4688658Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4688808Z "size": 106, 2025-12-04T11:12:53.4688961Z "digest": "sha256:60b679430e4e0b7690392dfe4f5dc417847f7a3ba2b761ce747b66d412e1d956" 2025-12-04T11:12:53.4689130Z }, 2025-12-04T11:12:53.4689206Z { 2025-12-04T11:12:53.4689331Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4689482Z "size": 1671, 2025-12-04T11:12:53.4689644Z "digest": "sha256:3992ae84f9eda1c5c52fa96b1f1d0fc3f93c661c5cf0b971a504a260c290da49" 2025-12-04T11:12:53.4689864Z }, 2025-12-04T11:12:53.4689945Z { 2025-12-04T11:12:53.4690069Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4690220Z "size": 724, 2025-12-04T11:12:53.4690421Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T11:12:53.4690594Z }, 2025-12-04T11:12:53.4690674Z { 2025-12-04T11:12:53.4690798Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4690951Z "size": 138, 2025-12-04T11:12:53.4691103Z "digest": "sha256:62d400609f9c38fce4745f72372423072ba0f142b3c03775ccb317f6c5240966" 2025-12-04T11:12:53.4691271Z }, 2025-12-04T11:12:53.4691348Z { 2025-12-04T11:12:53.4691474Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4691626Z "size": 119, 2025-12-04T11:12:53.4691779Z "digest": "sha256:7e7b097490967d568331cc9f8afdd02422fe101c6364ec5e12dba2970991e533" 2025-12-04T11:12:53.4691952Z }, 2025-12-04T11:12:53.4692029Z { 2025-12-04T11:12:53.4692153Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4692306Z "size": 6231259865, 2025-12-04T11:12:53.4692481Z "digest": "sha256:7dcdbd8421cb17aaa5d0cb965ddf94e196cb364e762b12ab78024cb25e3b6bcd" 2025-12-04T11:12:53.4692657Z }, 2025-12-04T11:12:53.4692732Z { 2025-12-04T11:12:53.4692855Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4693007Z "size": 174, 2025-12-04T11:12:53.4693158Z "digest": "sha256:cbb12613719bab9f179968227f9fb8881251992804e460b9a9e1c00f3ac4a0c5" 2025-12-04T11:12:53.4693326Z }, 2025-12-04T11:12:53.4693403Z { 2025-12-04T11:12:53.4693529Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4693681Z "size": 1896, 2025-12-04T11:12:53.4693838Z "digest": "sha256:e87038dce9bc8e13bd64006847d30ddcaf77455256c4985fccfec83f82d4b925" 2025-12-04T11:12:53.4694055Z }, 2025-12-04T11:12:53.4694133Z { 2025-12-04T11:12:53.4694257Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4694410Z "size": 162783968, 2025-12-04T11:12:53.4694574Z "digest": "sha256:e4606b636f96f1c80f4be26aeb9d6f5f990f6149789c2de160451c5ac76a467d" 2025-12-04T11:12:53.4694752Z }, 2025-12-04T11:12:53.4694829Z { 2025-12-04T11:12:53.4694957Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4695109Z "size": 302, 2025-12-04T11:12:53.4695263Z "digest": "sha256:6f2a5d33b946e561219b9968769773e36ce1d28bee8c62eff652098b7825fc79" 2025-12-04T11:12:53.4695432Z }, 2025-12-04T11:12:53.4695509Z { 2025-12-04T11:12:53.4695634Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4695787Z "size": 32, 2025-12-04T11:12:53.4695944Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T11:12:53.4696119Z }, 2025-12-04T11:12:53.4696196Z { 2025-12-04T11:12:53.4696320Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4696473Z "size": 108, 2025-12-04T11:12:53.4696629Z "digest": "sha256:a4f2bf2f19e63b91d46f2d9cf11a25c657517a6835996404da1e79a09d918b0e" 2025-12-04T11:12:53.4696801Z }, 2025-12-04T11:12:53.4696885Z { 2025-12-04T11:12:53.4697011Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T11:12:53.4697164Z "size": 54145661, 2025-12-04T11:12:53.4697330Z "digest": "sha256:1ae00acdac56cbc6d3f81b3c5d854a2b77c30d458b0fbe18c5935145364484f0" 2025-12-04T11:12:53.4697502Z } 2025-12-04T11:12:53.4697579Z ] 2025-12-04T11:12:53.4697658Z } 2025-12-04T11:12:53.4718127Z ##[group]Run set -eux 2025-12-04T11:12:53.4718308Z set -eux 2025-12-04T11:12:53.4718549Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T11:12:53.4719156Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T11:12:53.4725271Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:53.4725499Z env: 2025-12-04T11:12:53.4725649Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:53.4725991Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:53.4726192Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:53.4726379Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:53.4726932Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:53.4727475Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:53.4727618Z AWS_REGION: us-east-1 2025-12-04T11:12:53.4727862Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:53.4728034Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:53.4730523Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:53.4730640Z ##[endgroup] 2025-12-04T11:12:53.4758113Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T11:12:53.4758614Z /home/runner/_work/_temp/d4627012-8d55-4b35-af1e-9ef509b79fb2.sh: line 3: aws: command not found 2025-12-04T11:12:53.4759046Z + jq --raw-output .SecretString 2025-12-04T11:12:53.4761021Z + jq -r .docker_hub_readonly_token 2025-12-04T11:12:53.4761290Z + docker login --username pytorchbot --password-stdin 2025-12-04T11:12:53.4858274Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T11:12:53.4864556Z + true 2025-12-04T11:12:53.4941360Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T11:12:53.4941546Z with: 2025-12-04T11:12:53.4941987Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:53.4942321Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:53.4942480Z env: 2025-12-04T11:12:53.4942578Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:53.4942718Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:53.4942897Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:53.4943065Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:53.4943587Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:53.4944088Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:53.4944214Z AWS_REGION: us-east-1 2025-12-04T11:12:53.4944438Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:53.4944595Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:53.4946765Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:53.4946870Z ##[endgroup] 2025-12-04T11:12:53.4953507Z ##[group]Run set -x 2025-12-04T11:12:53.4953630Z set -x 2025-12-04T11:12:53.4953740Z set +e 2025-12-04T11:12:53.4953832Z  2025-12-04T11:12:53.4953926Z login() { 2025-12-04T11:12:53.4954122Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T11:12:53.4954321Z } 2025-12-04T11:12:53.4954410Z  2025-12-04T11:12:53.4954502Z retry () { 2025-12-04T11:12:53.4954616Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T11:12:53.4954746Z } 2025-12-04T11:12:53.4954839Z  2025-12-04T11:12:53.4954944Z retry login "${DOCKER_REGISTRY}" 2025-12-04T11:12:53.4955071Z  2025-12-04T11:12:53.4955266Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T11:12:53.4955513Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T11:12:53.4955660Z  2025-12-04T11:12:53.4955745Z set -e 2025-12-04T11:12:53.4955885Z # ignore output since only exit code is used for conditional 2025-12-04T11:12:53.4956075Z # only pull docker image if it's not available locally 2025-12-04T11:12:53.4956281Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T11:12:53.4956471Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T11:12:53.4956600Z fi 2025-12-04T11:12:53.4961012Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:53.4961168Z env: 2025-12-04T11:12:53.4961270Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:53.4961406Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:53.4961589Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:53.4961758Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:53.4962259Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:53.4962753Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:53.4962878Z AWS_REGION: us-east-1 2025-12-04T11:12:53.4963022Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:53.4963183Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:53.4965288Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:53.4965652Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:53.4965973Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:53.4966172Z ##[endgroup] 2025-12-04T11:12:53.4985492Z + set +e 2025-12-04T11:12:53.4985846Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:53.4986177Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:53.4989737Z + aws ecr get-login-password --region us-east-1 2025-12-04T11:12:53.4990198Z /home/runner/_work/_temp/8ae44aba-8349-4151-ae99-0e93b635e86a.sh: line 5: aws: command not found 2025-12-04T11:12:53.4990714Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:53.5067111Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T11:12:53.5075848Z + sleep 1 2025-12-04T11:12:54.5088316Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:54.5091305Z + aws ecr get-login-password --region us-east-1 2025-12-04T11:12:54.5091936Z /home/runner/_work/_temp/8ae44aba-8349-4151-ae99-0e93b635e86a.sh: line 5: aws: command not found 2025-12-04T11:12:54.5092691Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:54.5183796Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T11:12:54.5194210Z + sleep 2 2025-12-04T11:12:56.5205570Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:56.5208661Z + aws ecr get-login-password --region us-east-1 2025-12-04T11:12:56.5209259Z /home/runner/_work/_temp/8ae44aba-8349-4151-ae99-0e93b635e86a.sh: line 5: aws: command not found 2025-12-04T11:12:56.5210145Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T11:12:56.5298281Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T11:12:56.5315669Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:56.5316277Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T11:12:57.8492836Z + IMAGE_SIZE=18579.916069984436 2025-12-04T11:12:57.8493399Z + echo 'Compressed size of image in MB: 18579.916069984436' 2025-12-04T11:12:57.8493825Z + set -e 2025-12-04T11:12:57.8494635Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:12:57.8495530Z Compressed size of image in MB: 18579.916069984436 2025-12-04T11:12:57.8674210Z Prepare all required actions 2025-12-04T11:12:57.8689624Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T11:12:57.8689844Z with: 2025-12-04T11:12:57.8690155Z github-token: *** 2025-12-04T11:12:57.8690261Z env: 2025-12-04T11:12:57.8690361Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:57.8690506Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:57.8690689Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:57.8690862Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:57.8691389Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:57.8691889Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:57.8692027Z AWS_REGION: us-east-1 2025-12-04T11:12:57.8692207Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:57.8692362Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:57.8694485Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:57.8694596Z ##[endgroup] 2025-12-04T11:12:57.8701227Z ##[group]Run set -eux 2025-12-04T11:12:57.8701351Z set -eux 2025-12-04T11:12:57.8701524Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T11:12:57.8705770Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:12:57.8706049Z env: 2025-12-04T11:12:57.8706151Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:12:57.8706319Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:12:57.8706504Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:12:57.8706678Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:12:57.8707183Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:12:57.8707673Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:12:57.8707799Z AWS_REGION: us-east-1 2025-12-04T11:12:57.8707952Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:12:57.8708114Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:12:57.8710281Z AWS_SESSION_TOKEN: *** 2025-12-04T11:12:57.8710456Z GITHUB_TOKEN: *** 2025-12-04T11:12:57.8710566Z ##[endgroup] 2025-12-04T11:12:57.8730935Z + python3 .github/scripts/get_workflow_job_id.py 19922798714 linux.rocm.gpu.gfx942.4.b-bphpw-runner-rf5f6 2025-12-04T11:12:58.6169974Z Setting output job-id=57117547552 2025-12-04T11:12:58.6170645Z Setting output job-name=linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:12:58.6267239Z Prepare all required actions 2025-12-04T11:12:58.6267452Z Getting action download info 2025-12-04T11:12:58.8913978Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T11:12:59.8105946Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T11:13:00.6013149Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T11:13:00.6013313Z with: 2025-12-04T11:13:00.6013428Z name: linux-noble-rocm-py3.12-mi300 2025-12-04T11:13:00.6013569Z s3-bucket: gha-artifacts 2025-12-04T11:13:00.6013694Z env: 2025-12-04T11:13:00.6013795Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:00.6013940Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:00.6014123Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:00.6014296Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:00.6014831Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:00.6015324Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:00.6015444Z AWS_REGION: us-east-1 2025-12-04T11:13:00.6015622Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:00.6015781Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:00.6017896Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:00.6018006Z ##[endgroup] 2025-12-04T11:13:00.6055369Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T11:13:00.6055520Z with: 2025-12-04T11:13:00.6055637Z name: linux-noble-rocm-py3.12-mi300 2025-12-04T11:13:00.6055778Z s3-bucket: gha-artifacts 2025-12-04T11:13:00.6055890Z region: us-east-1 2025-12-04T11:13:00.6055988Z env: 2025-12-04T11:13:00.6056083Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:00.6056226Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:00.6056404Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:00.6056576Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:00.6057083Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:00.6057580Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:00.6078422Z AWS_REGION: us-east-1 2025-12-04T11:13:00.6078592Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:00.6078760Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:00.6080929Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:00.6081034Z ##[endgroup] 2025-12-04T11:13:00.8352930Z (node:17074) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T11:13:00.8353260Z 2025-12-04T11:13:00.8353397Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T11:13:00.8353757Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T11:13:00.8354128Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T11:13:01.1062938Z Found 1 objects with prefix pytorch/pytorch/19922798714/linux-noble-rocm-py3.12-mi300/ 2025-12-04T11:13:01.1063472Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T11:13:34.8168635Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T11:13:34.8171994Z Artifact download has finished successfully 2025-12-04T11:13:34.8461471Z ##[group]Run unzip -o artifacts.zip 2025-12-04T11:13:34.8461674Z unzip -o artifacts.zip 2025-12-04T11:13:34.8466170Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:34.8466325Z env: 2025-12-04T11:13:34.8466586Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:34.8466727Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:34.8466903Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:34.8467078Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:34.8467583Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:34.8468085Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:34.8468201Z AWS_REGION: us-east-1 2025-12-04T11:13:34.8468384Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:34.8468541Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:34.8470696Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:34.8470809Z ##[endgroup] 2025-12-04T11:13:34.8508449Z Archive: artifacts.zip 2025-12-04T11:13:34.8510239Z creating: dist/ 2025-12-04T11:13:37.7924242Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp312-cp312-linux_x86_64.whl 2025-12-04T11:13:37.8003299Z inflating: dist/.ninja_log 2025-12-04T11:13:37.8008747Z creating: build/custom_test_artifacts/ 2025-12-04T11:13:37.8009242Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T11:13:37.8009676Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T11:13:37.8010223Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T11:13:37.8010778Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T11:13:37.8011359Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T11:13:37.8011886Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T11:13:37.8012452Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T11:13:37.8012995Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T11:13:37.8013582Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T11:13:37.8014107Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T11:13:37.8014603Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T11:13:37.8015077Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T11:13:37.8016219Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T11:13:37.8016785Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T11:13:37.8017339Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T11:13:37.8017842Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T11:13:37.8018398Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T11:13:37.8018974Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T11:13:37.8019475Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T11:13:37.8019931Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T11:13:37.8020368Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T11:13:37.8020814Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T11:13:37.8021292Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T11:13:37.8022029Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T11:13:37.8022548Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T11:13:37.8023042Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T11:13:37.8023541Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T11:13:37.8024043Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T11:13:37.8024580Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T11:13:37.8025025Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T11:13:37.8025399Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T11:13:37.8028996Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T11:13:37.8145328Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T11:13:37.8145896Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.d 2025-12-04T11:13:37.8146345Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T11:13:37.8146808Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T11:13:37.8147370Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T11:13:37.8147841Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T11:13:37.8148307Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T11:13:37.8148771Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T11:13:37.8149220Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T11:13:37.8149673Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T11:13:37.8150158Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T11:13:37.8150598Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T11:13:37.8160570Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T11:13:37.8208117Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T11:13:37.8208533Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.d 2025-12-04T11:13:37.8208924Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T11:13:37.8209300Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T11:13:37.8209630Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T11:13:37.8209988Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T11:13:37.8210296Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T11:13:37.8210627Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_outer_vec.cc 2025-12-04T11:13:37.8210939Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_vec_ext.cc 2025-12-04T11:13:37.8211778Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T11:13:37.8212290Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T11:13:37.8212571Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T11:13:37.8314540Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T11:13:37.8348276Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T11:13:37.8348544Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T11:13:37.8348770Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T11:13:37.8349032Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T11:13:37.8351710Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T11:13:37.8352454Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T11:13:37.8353131Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T11:13:37.8353874Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T11:13:37.8354564Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T11:13:37.8355355Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T11:13:37.8356156Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T11:13:37.8356903Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T11:13:37.8357620Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T11:13:37.8358329Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T11:13:37.8359170Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T11:13:37.8360053Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T11:13:37.8360815Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T11:13:37.8361642Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T11:13:37.8362511Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T11:13:37.8363271Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T11:13:37.8363866Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T11:13:37.8364321Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T11:13:37.8364619Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T11:13:37.8364942Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T11:13:37.8365296Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T11:13:37.8365639Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T11:13:37.8365960Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T11:13:37.8366289Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T11:13:37.8366623Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T11:13:37.8366959Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T11:13:37.8367287Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T11:13:37.8367754Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T11:13:37.8373034Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T11:13:37.8409948Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T11:13:37.8410377Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.d 2025-12-04T11:13:37.8410774Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T11:13:37.8411070Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T11:13:37.8411340Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T11:13:37.8411592Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T11:13:37.8412223Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T11:13:37.8412479Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_outer_vec.cc 2025-12-04T11:13:37.8412722Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_vec_ext.cc 2025-12-04T11:13:37.8413559Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T11:13:37.8414009Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T11:13:37.8414314Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T11:13:37.8437205Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T11:13:37.8437458Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T11:13:37.8437696Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T11:13:37.8437970Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T11:13:37.8440149Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T11:13:37.8440468Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T11:13:37.8440766Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T11:13:37.8441089Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T11:13:37.8441403Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T11:13:37.8442017Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T11:13:37.8442728Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T11:13:37.8443057Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T11:13:37.8443385Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T11:13:37.8443691Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T11:13:37.8444711Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T11:13:37.8445326Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T11:13:37.8445759Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T11:13:37.8446717Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T11:13:37.8447360Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T11:13:37.8447694Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T11:13:37.8448028Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T11:13:37.8448313Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T11:13:37.8448614Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T11:13:37.8448940Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T11:13:37.8449310Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T11:13:37.8449681Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T11:13:37.8450056Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T11:13:37.8450400Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T11:13:37.8450747Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T11:13:37.8451094Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T11:13:37.8451444Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T11:13:37.8451789Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T11:13:37.8452429Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T11:13:37.8522613Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T11:13:37.8522937Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.d 2025-12-04T11:13:37.8523237Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T11:13:37.8523562Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T11:13:37.8523919Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T11:13:37.8524258Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T11:13:37.8524579Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T11:13:37.8524965Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T11:13:37.8525294Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T11:13:37.8525632Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T11:13:37.8525959Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T11:13:37.8526283Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T11:13:37.8537262Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T11:13:37.8569602Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T11:13:37.8570001Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.d 2025-12-04T11:13:37.8570327Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T11:13:37.8570697Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T11:13:37.8570979Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T11:13:37.8571241Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T11:13:37.8571725Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T11:13:37.8572000Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_outer_vec.cc 2025-12-04T11:13:37.8572270Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_vec_ext.cc 2025-12-04T11:13:37.8573178Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T11:13:37.8573592Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T11:13:37.8574218Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T11:13:37.8634382Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T11:13:37.8657899Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T11:13:37.8658099Z creating: build/lib/ 2025-12-04T11:13:37.8707431Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T11:13:37.8973283Z inflating: build/lib/libprotobuf.a 2025-12-04T11:13:37.9271170Z inflating: build/lib/libprotoc.a 2025-12-04T11:13:37.9276890Z inflating: build/lib/libpthreadpool.a 2025-12-04T11:13:37.9281630Z inflating: build/lib/libcpuinfo.a 2025-12-04T11:13:37.9286124Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T11:13:37.9286709Z inflating: build/lib/libclog.a 2025-12-04T11:13:37.9298170Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T11:13:37.9299239Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T11:13:37.9411926Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T11:13:37.9422570Z inflating: build/lib/libnnpack.a 2025-12-04T11:13:37.9951515Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T11:13:37.9992803Z inflating: build/lib/libgtest.a 2025-12-04T11:13:38.0003043Z inflating: build/lib/libgmock.a 2025-12-04T11:13:38.0003265Z inflating: build/lib/libgtest_main.a 2025-12-04T11:13:38.0003464Z inflating: build/lib/libgmock_main.a 2025-12-04T11:13:38.0057915Z inflating: build/lib/libXNNPACK.a 2025-12-04T11:13:38.0103442Z inflating: build/lib/libbenchmark.a 2025-12-04T11:13:38.0103674Z inflating: build/lib/libbenchmark_main.a 2025-12-04T11:13:38.0143510Z inflating: build/lib/libasmjit.a 2025-12-04T11:13:38.0143736Z inflating: build/lib/libjitprofiling.a 2025-12-04T11:13:38.0148468Z inflating: build/lib/libittnotify.a 2025-12-04T11:13:38.0840989Z inflating: build/lib/libfbgemm.a 2025-12-04T11:13:38.0859414Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T11:13:38.1183767Z inflating: build/lib/libtensorpipe.a 2025-12-04T11:13:38.1256289Z inflating: build/lib/libgloo.a 2025-12-04T11:13:38.1284145Z inflating: build/lib/libonnx_proto.a 2025-12-04T11:13:38.1531884Z inflating: build/lib/libgloo_hip.a 2025-12-04T11:13:38.1959997Z inflating: build/lib/libonnx.a 2025-12-04T11:13:38.7997978Z inflating: build/lib/libdnnl.a 2025-12-04T11:13:38.8009409Z inflating: build/lib/libfmt.a 2025-12-04T11:13:38.8196557Z inflating: build/lib/libkineto.a 2025-12-04T11:13:38.8267223Z inflating: build/lib/libc10.so 2025-12-04T11:13:38.8267823Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T11:13:38.8268533Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T11:13:38.8295612Z inflating: build/lib/libc10_hip.so 2025-12-04T11:13:38.8582208Z inflating: build/lib/libfbgemm_genai.a 2025-12-04T11:13:40.7146957Z inflating: build/lib/libtorch_cpu.so 2025-12-04T11:13:40.7148147Z inflating: build/lib/libshm.so 2025-12-04T11:13:41.5677767Z inflating: build/lib/libtorch_hip.so 2025-12-04T11:13:41.5678220Z inflating: build/lib/libtorch.so 2025-12-04T11:13:41.5689835Z inflating: build/lib/libjitbackend_test.so 2025-12-04T11:13:41.5703462Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T11:13:41.5746306Z inflating: build/lib/libtorchbind_test.so 2025-12-04T11:13:41.5762161Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T11:13:41.7211438Z inflating: build/lib/libtorch_python.so 2025-12-04T11:13:41.7233439Z inflating: build/lib/libnnapi_backend.so 2025-12-04T11:13:41.7233739Z creating: build/bin/ 2025-12-04T11:13:41.7233978Z creating: build/bin/CMakeFiles/ 2025-12-04T11:13:41.7234241Z inflating: build/bin/cmake_install.cmake 2025-12-04T11:13:41.7234529Z inflating: build/bin/CTestTestfile.cmake 2025-12-04T11:13:41.7512720Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T11:13:41.7790816Z inflating: build/bin/protoc 2025-12-04T11:13:41.7827098Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T11:13:41.7861203Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T11:13:41.7895767Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T11:13:41.7930449Z inflating: build/bin/c10_Device_test 2025-12-04T11:13:41.7963966Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T11:13:41.7999987Z inflating: build/bin/c10_Scalar_test 2025-12-04T11:13:41.8039399Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T11:13:41.8075891Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T11:13:41.8113642Z inflating: build/bin/c10_SymInt_test 2025-12-04T11:13:41.8150877Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T11:13:41.8188103Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T11:13:41.8221560Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T11:13:41.8268007Z inflating: build/bin/c10_cow_test 2025-12-04T11:13:41.8301108Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T11:13:41.8334395Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T11:13:41.8372653Z inflating: build/bin/c10_Enumerate_test 2025-12-04T11:13:41.8408233Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T11:13:41.8442353Z inflating: build/bin/c10_Half_test 2025-12-04T11:13:41.8477734Z inflating: build/bin/c10_Bitset_test 2025-12-04T11:13:41.8515051Z inflating: build/bin/c10_LeftRight_test 2025-12-04T11:13:41.8548500Z inflating: build/bin/c10_Semaphore_test 2025-12-04T11:13:41.8584145Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T11:13:41.8621140Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T11:13:41.8654907Z inflating: build/bin/c10_Synchronized_test 2025-12-04T11:13:41.8689639Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T11:13:41.8724425Z inflating: build/bin/c10_accumulate_test 2025-12-04T11:13:41.8757806Z inflating: build/bin/c10_error_test 2025-12-04T11:13:41.8791827Z inflating: build/bin/c10_bit_cast_test 2025-12-04T11:13:41.8829074Z inflating: build/bin/c10_bfloat16_test 2025-12-04T11:13:41.8866037Z inflating: build/bin/c10_complex_test 2025-12-04T11:13:41.8901189Z inflating: build/bin/c10_exception_test 2025-12-04T11:13:41.8938902Z inflating: build/bin/c10_complex_math_test 2025-12-04T11:13:41.8972928Z inflating: build/bin/c10_flags_test 2025-12-04T11:13:41.9007007Z inflating: build/bin/c10_generic_math_test 2025-12-04T11:13:41.9041439Z inflating: build/bin/c10_irange_test 2025-12-04T11:13:41.9140962Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T11:13:41.9176898Z inflating: build/bin/c10_lazy_test 2025-12-04T11:13:41.9215162Z inflating: build/bin/c10_logging_test 2025-12-04T11:13:41.9248643Z inflating: build/bin/c10_nofatal_test 2025-12-04T11:13:41.9297968Z inflating: build/bin/c10_optional_test 2025-12-04T11:13:41.9333662Z inflating: build/bin/c10_registry_test 2025-12-04T11:13:41.9374720Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T11:13:41.9471627Z inflating: build/bin/c10_small_vector_test 2025-12-04T11:13:41.9506436Z inflating: build/bin/c10_ssize_test 2025-12-04T11:13:41.9544174Z inflating: build/bin/c10_string_util_test 2025-12-04T11:13:41.9577187Z inflating: build/bin/c10_string_view_test 2025-12-04T11:13:41.9606737Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T11:13:41.9640728Z inflating: build/bin/c10_tempfile_test 2025-12-04T11:13:41.9678325Z inflating: build/bin/c10_typeid_test 2025-12-04T11:13:41.9711407Z inflating: build/bin/c10_hip_HIPAssertionsTest_1_var_test 2025-12-04T11:13:41.9744783Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_stream 2025-12-04T11:13:41.9777915Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_thread_and_block_and_device 2025-12-04T11:13:41.9810777Z inflating: build/bin/c10_hip_HIPAssertionsTest_from_2_processes 2025-12-04T11:13:41.9843667Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T11:13:41.9876678Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T11:13:41.9909733Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_same_block 2025-12-04T11:13:41.9942814Z inflating: build/bin/c10_hip_HIPTest 2025-12-04T11:13:42.0304877Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T11:13:42.0677314Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T11:13:42.1055034Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T11:13:42.1118387Z inflating: build/bin/test_aoti_abi_check 2025-12-04T11:13:42.1151559Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T11:13:42.1185067Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T11:13:42.1218598Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T11:13:42.1254005Z inflating: build/bin/BackoffTest 2025-12-04T11:13:42.1289867Z inflating: build/bin/FileStoreTest 2025-12-04T11:13:42.1327568Z inflating: build/bin/TCPStoreTest 2025-12-04T11:13:42.1363499Z inflating: build/bin/HashStoreTest 2025-12-04T11:13:42.1407699Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T11:13:42.1409295Z inflating: build/bin/example_allreduce 2025-12-04T11:13:42.1411272Z inflating: build/bin/torch_shm_manager 2025-12-04T11:13:42.1447586Z inflating: build/bin/static_runtime_bench 2025-12-04T11:13:42.1606185Z inflating: build/bin/static_runtime_test 2025-12-04T11:13:42.1654680Z inflating: build/bin/Dict_test 2025-12-04T11:13:42.1689920Z inflating: build/bin/Dimname_test 2025-12-04T11:13:42.1732987Z inflating: build/bin/MaybeOwned_test 2025-12-04T11:13:42.1770943Z inflating: build/bin/NamedTensor_test 2025-12-04T11:13:42.1810110Z inflating: build/bin/apply_utils_test 2025-12-04T11:13:42.1849548Z inflating: build/bin/atest 2025-12-04T11:13:42.1891911Z inflating: build/bin/basic 2025-12-04T11:13:42.1928424Z inflating: build/bin/broadcast_test 2025-12-04T11:13:42.1962523Z inflating: build/bin/cpu_allocator_test 2025-12-04T11:13:42.2001129Z inflating: build/bin/cpu_generator_test 2025-12-04T11:13:42.2036690Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T11:13:42.2096462Z inflating: build/bin/cpu_rng_test 2025-12-04T11:13:42.2131092Z inflating: build/bin/dlconvertor_test 2025-12-04T11:13:42.2169495Z inflating: build/bin/extension_backend_test 2025-12-04T11:13:42.2206767Z inflating: build/bin/half_test 2025-12-04T11:13:42.2270122Z inflating: build/bin/ivalue_test 2025-12-04T11:13:42.2303710Z inflating: build/bin/lazy_tensor_test 2025-12-04T11:13:42.2339052Z inflating: build/bin/math_kernel_test 2025-12-04T11:13:42.2374707Z inflating: build/bin/memory_format_test 2025-12-04T11:13:42.2410653Z inflating: build/bin/memory_overlapping_test 2025-12-04T11:13:42.2444885Z inflating: build/bin/operator_name_test 2025-12-04T11:13:42.2480547Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T11:13:42.2517696Z inflating: build/bin/native_test 2025-12-04T11:13:42.2552625Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T11:13:42.2586765Z inflating: build/bin/operators_test 2025-12-04T11:13:42.2631434Z inflating: build/bin/pow_test 2025-12-04T11:13:42.2669117Z inflating: build/bin/quantized_test 2025-12-04T11:13:42.2703809Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T11:13:42.2737510Z inflating: build/bin/reduce_ops_test 2025-12-04T11:13:42.2771973Z inflating: build/bin/StorageUtils_test 2025-12-04T11:13:42.2810272Z inflating: build/bin/scalar_test 2025-12-04T11:13:42.2847730Z inflating: build/bin/scalar_tensor_test 2025-12-04T11:13:42.2882713Z inflating: build/bin/stride_properties_test 2025-12-04T11:13:42.2934530Z inflating: build/bin/tensor_iterator_test 2025-12-04T11:13:42.2970765Z inflating: build/bin/test_parallel 2025-12-04T11:13:42.3007714Z inflating: build/bin/type_ptr_test 2025-12-04T11:13:42.3041807Z inflating: build/bin/thread_init_test 2025-12-04T11:13:42.3077279Z inflating: build/bin/undefined_tensor_test 2025-12-04T11:13:42.3116631Z inflating: build/bin/type_test 2025-12-04T11:13:42.3149930Z inflating: build/bin/verify_api_visibility 2025-12-04T11:13:42.3184068Z inflating: build/bin/weakref_test 2025-12-04T11:13:42.3231150Z inflating: build/bin/legacy_vmap_test 2025-12-04T11:13:42.3265616Z inflating: build/bin/wrapdim_test 2025-12-04T11:13:42.3305151Z inflating: build/bin/IListRef_test 2025-12-04T11:13:42.3339512Z inflating: build/bin/xla_tensor_test 2025-12-04T11:13:42.3407374Z inflating: build/bin/List_test 2025-12-04T11:13:42.3484959Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T11:13:42.3546957Z inflating: build/bin/kernel_function_test 2025-12-04T11:13:42.3590435Z inflating: build/bin/KernelFunction_test 2025-12-04T11:13:42.3671527Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T11:13:42.3737316Z inflating: build/bin/kernel_lambda_test 2025-12-04T11:13:42.3799426Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T11:13:42.3839445Z inflating: build/bin/kernel_stackbased_test 2025-12-04T11:13:42.3873867Z inflating: build/bin/CppSignature_test 2025-12-04T11:13:42.3907042Z inflating: build/bin/op_allowlist_test 2025-12-04T11:13:42.4100224Z inflating: build/bin/op_registration_test 2025-12-04T11:13:42.4133417Z inflating: build/bin/hip_complex_math_test 2025-12-04T11:13:42.4177737Z inflating: build/bin/inline_container_test 2025-12-04T11:13:42.4214462Z inflating: build/bin/backend_fallback_test 2025-12-04T11:13:42.4249888Z inflating: build/bin/hip_apply_test 2025-12-04T11:13:42.4283099Z inflating: build/bin/hip_complex_test 2025-12-04T11:13:42.4315853Z inflating: build/bin/hip_distributions_test 2025-12-04T11:13:42.4348950Z inflating: build/bin/hip_generator_test 2025-12-04T11:13:42.4381907Z inflating: build/bin/hip_half_test 2025-12-04T11:13:42.4415012Z inflating: build/bin/hip_integer_divider_test 2025-12-04T11:13:42.4448005Z inflating: build/bin/hip_optional_test 2025-12-04T11:13:42.4481015Z inflating: build/bin/hip_packedtensoraccessor_test 2025-12-04T11:13:42.4515875Z inflating: build/bin/hip_dlconvertor_test 2025-12-04T11:13:42.4548823Z inflating: build/bin/hip_vectorized_test 2025-12-04T11:13:42.5230346Z inflating: build/bin/test_jit 2025-12-04T11:13:42.5446822Z inflating: build/bin/test_lazy 2025-12-04T11:13:42.5483813Z inflating: build/bin/test_dist_autograd 2025-12-04T11:13:42.5529081Z inflating: build/bin/test_cpp_rpc 2025-12-04T11:13:42.5530223Z inflating: build/bin/parallel_benchmark 2025-12-04T11:13:42.6256737Z inflating: build/bin/test_api 2025-12-04T11:13:42.6257031Z creating: .additional_ci_files/ 2025-12-04T11:13:42.6295603Z inflating: .additional_ci_files/test-times.json 2025-12-04T11:13:42.6438214Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T11:13:42.6468038Z ##[group]Run rm artifacts.zip 2025-12-04T11:13:42.6468261Z rm artifacts.zip 2025-12-04T11:13:42.6473457Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:42.6473635Z env: 2025-12-04T11:13:42.6473754Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:42.6473913Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:42.6474116Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:42.6474313Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:42.6474881Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:42.6475451Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:42.6475589Z AWS_REGION: us-east-1 2025-12-04T11:13:42.6475795Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:42.6475975Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:42.6478353Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:42.6478482Z ##[endgroup] 2025-12-04T11:13:42.7622663Z ##[group]Run df -H 2025-12-04T11:13:42.7622817Z df -H 2025-12-04T11:13:42.7627986Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:42.7628180Z env: 2025-12-04T11:13:42.7628309Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:42.7628482Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:42.7628722Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:42.7628940Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:42.7629605Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:42.7630263Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:42.7630402Z AWS_REGION: us-east-1 2025-12-04T11:13:42.7630586Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:42.7630748Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:42.7632968Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:42.7633080Z ##[endgroup] 2025-12-04T11:13:42.8081052Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T11:13:42.8081359Z overlay 16T 366G 15T 3% / 2025-12-04T11:13:42.8081615Z tmpfs 68M 0 68M 0% /dev 2025-12-04T11:13:42.8081861Z /dev/md0 16T 366G 15T 3% /run 2025-12-04T11:13:42.8082105Z shm 68M 17k 68M 1% /dev/shm 2025-12-04T11:13:42.8082523Z amdprj2-k8s_2 5.5T 120G 5.4T 3% /home/runner/pytorch-data 2025-12-04T11:13:42.8083190Z tmpfs 3.3T 13k 3.3T 1% /run/secrets/kubernetes.io/serviceaccount 2025-12-04T11:13:42.8083542Z tmpfs 1.7T 0 1.7T 0% /proc/acpi 2025-12-04T11:13:42.8083797Z tmpfs 1.7T 0 1.7T 0% /proc/scsi 2025-12-04T11:13:42.8084056Z tmpfs 1.7T 0 1.7T 0% /sys/firmware 2025-12-04T11:13:42.8084370Z tmpfs 1.7T 0 1.7T 0% /sys/devices/virtual/powercap 2025-12-04T11:13:42.8111121Z Prepare all required actions 2025-12-04T11:13:42.8111375Z Getting action download info 2025-12-04T11:13:43.0334210Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T11:13:43.0334347Z with: 2025-12-04T11:13:43.0334436Z env: 2025-12-04T11:13:43.0334530Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:43.0334667Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:43.0334846Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:43.0335016Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:43.0335519Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:43.0336010Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:43.0336140Z AWS_REGION: us-east-1 2025-12-04T11:13:43.0336323Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:43.0336473Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:43.0338600Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:43.0338703Z ##[endgroup] 2025-12-04T11:13:43.0351668Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T11:13:43.0351813Z with: 2025-12-04T11:13:43.0351916Z name: td_results 2025-12-04T11:13:43.0352028Z s3-bucket: gha-artifacts 2025-12-04T11:13:43.0352144Z region: us-east-1 2025-12-04T11:13:43.0352247Z env: 2025-12-04T11:13:43.0352355Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:43.0352500Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:43.0352686Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:43.0352860Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:43.0353373Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:43.0353866Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:43.0353993Z AWS_REGION: us-east-1 2025-12-04T11:13:43.0354132Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:43.0354293Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:43.0356421Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:43.0356532Z ##[endgroup] 2025-12-04T11:13:43.2689479Z (node:17105) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T11:13:43.2690034Z 2025-12-04T11:13:43.2690215Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T11:13:43.2690673Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T11:13:43.2691128Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T11:13:43.5359956Z Found 1 objects with prefix pytorch/pytorch/19922798714/td_results/ 2025-12-04T11:13:43.5360394Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json 2025-12-04T11:13:43.9904866Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json 2025-12-04T11:13:43.9908758Z Artifact download has finished successfully 2025-12-04T11:13:44.0076945Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T11:13:44.0077169Z mkdir -p .additional_ci_files 2025-12-04T11:13:44.0077404Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T11:13:44.0082373Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:44.0082542Z env: 2025-12-04T11:13:44.0082650Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:44.0082800Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:44.0082996Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:44.0083176Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:44.0083908Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:44.0084452Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:44.0084579Z AWS_REGION: us-east-1 2025-12-04T11:13:44.0084794Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:44.0084963Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:44.0087318Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:44.0087437Z ##[endgroup] 2025-12-04T11:13:44.0143317Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T11:13:44.0143465Z .github/scripts/parse_ref.py 2025-12-04T11:13:44.0145892Z shell: /usr/bin/bash -e {0} 2025-12-04T11:13:44.0145998Z env: 2025-12-04T11:13:44.0146090Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:44.0146226Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:44.0146401Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:44.0146566Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:44.0147064Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:44.0147558Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:44.0147673Z AWS_REGION: us-east-1 2025-12-04T11:13:44.0147817Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:44.0147967Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:44.0150147Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:44.0150255Z ##[endgroup] 2025-12-04T11:13:44.0247865Z Setting output branch=main 2025-12-04T11:13:44.0312560Z Prepare all required actions 2025-12-04T11:13:44.0312779Z Getting action download info 2025-12-04T11:13:44.2484027Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T11:13:44.2484181Z with: 2025-12-04T11:13:44.2484386Z github-token: *** 2025-12-04T11:13:44.2485656Z test-matrix: {"include": [{"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T11:13:44.2487145Z job-name: linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:13:44.2487398Z env: 2025-12-04T11:13:44.2487604Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:44.2487750Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:44.2487934Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:44.2488105Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:44.2488611Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:44.2489102Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:44.2489226Z AWS_REGION: us-east-1 2025-12-04T11:13:44.2489355Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:44.2489512Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:44.2491687Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:44.2491796Z ##[endgroup] 2025-12-04T11:13:44.2517988Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T11:13:44.2518124Z with: 2025-12-04T11:13:44.2518218Z shell: bash 2025-12-04T11:13:44.2518319Z timeout_minutes: 10 2025-12-04T11:13:44.2518425Z max_attempts: 5 2025-12-04T11:13:44.2518530Z retry_wait_seconds: 30 2025-12-04T11:13:44.2518827Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T11:13:44.2519133Z polling_interval_seconds: 1 2025-12-04T11:13:44.2519248Z warning_on_retry: true 2025-12-04T11:13:44.2519356Z continue_on_error: false 2025-12-04T11:13:44.2519464Z env: 2025-12-04T11:13:44.2519559Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:44.2519747Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:44.2519931Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:44.2520096Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:44.2520592Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:44.2521212Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:44.2521331Z AWS_REGION: us-east-1 2025-12-04T11:13:44.2521461Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:44.2521611Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:44.2523718Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:44.2523875Z GITHUB_TOKEN: *** 2025-12-04T11:13:44.2523974Z ##[endgroup] 2025-12-04T11:13:44.2915642Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T11:13:44.4340784Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T11:13:44.5261064Z Collecting requests==2.27.1 2025-12-04T11:13:44.5626093Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T11:13:44.5728044Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.1/63.1 KB 6.2 MB/s eta 0:00:00 2025-12-04T11:13:44.6182904Z Collecting pyyaml==6.0.2 2025-12-04T11:13:44.6236940Z Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB) 2025-12-04T11:13:44.6610910Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 KB 20.7 MB/s eta 0:00:00 2025-12-04T11:13:44.6946651Z Collecting urllib3<1.27,>=1.21.1 2025-12-04T11:13:44.7008565Z Downloading urllib3-1.26.20-py2.py3-none-any.whl (144 kB) 2025-12-04T11:13:44.7066911Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.2/144.2 KB 29.8 MB/s eta 0:00:00 2025-12-04T11:13:44.7260799Z Collecting certifi>=2017.4.17 2025-12-04T11:13:44.7324173Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T11:13:44.7411137Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 159.4/159.4 KB 25.0 MB/s eta 0:00:00 2025-12-04T11:13:44.8377039Z Collecting charset-normalizer~=2.0.0 2025-12-04T11:13:44.8430831Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T11:13:44.8590227Z Collecting idna<4,>=2.5 2025-12-04T11:13:44.8641738Z Downloading idna-3.11-py3-none-any.whl (71 kB) 2025-12-04T11:13:44.8661376Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.0/71.0 KB 88.7 MB/s eta 0:00:00 2025-12-04T11:13:44.9229560Z Installing collected packages: urllib3, pyyaml, idna, charset-normalizer, certifi, requests 2025-12-04T11:13:45.0150384Z WARNING: The script normalizer is installed in '/home/runner/.local/bin' which is not on PATH. 2025-12-04T11:13:45.0150839Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-12-04T11:13:45.0319216Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 idna-3.11 pyyaml-6.0.2 requests-2.27.1 urllib3-1.26.20 2025-12-04T11:13:45.2915604Z Command completed after 1 attempt(s). 2025-12-04T11:13:45.2963969Z ##[group]Run set -x 2025-12-04T11:13:45.2964139Z set -x 2025-12-04T11:13:45.2964267Z  2025-12-04T11:13:45.2964504Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T11:13:45.2964756Z # in runner workspace 2025-12-04T11:13:45.2964972Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T11:13:45.2970211Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:45.2970376Z env: 2025-12-04T11:13:45.2970483Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:45.2970634Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:45.2970833Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:45.2971021Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:45.2971568Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:45.2972103Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:45.2972236Z AWS_REGION: us-east-1 2025-12-04T11:13:45.2972406Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:45.2972575Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:45.2974892Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:45.2975141Z ##[endgroup] 2025-12-04T11:13:45.3007843Z + python3 /home/runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T11:13:45.3096212Z Setting output branch=main 2025-12-04T11:13:45.3133125Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T11:13:45.3133353Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T11:13:45.3133536Z echo "Job name: ${JOB_NAME}" 2025-12-04T11:13:45.3133694Z  2025-12-04T11:13:45.3133893Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T11:13:45.3134132Z # in runner workspace 2025-12-04T11:13:45.3134373Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T11:13:45.3134623Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T11:13:45.3134793Z  --job-name "${JOB_NAME}" \ 2025-12-04T11:13:45.3136518Z  --test-matrix "{"include": [{"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}]}" \ 2025-12-04T11:13:45.3138466Z  --selected-test-configs "" \ 2025-12-04T11:13:45.3138641Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T11:13:45.3138814Z  --tag "${TAG}" \ 2025-12-04T11:13:45.3138977Z  --event-name "${EVENT_NAME}" \ 2025-12-04T11:13:45.3139141Z  --schedule "${SCHEDULE}" \ 2025-12-04T11:13:45.3139304Z  --branch "${HEAD_BRANCH}" 2025-12-04T11:13:45.3143666Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:45.3143824Z env: 2025-12-04T11:13:45.3143927Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:45.3144073Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:45.3144265Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:45.3144439Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:45.3144975Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:45.3145488Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:45.3145615Z AWS_REGION: us-east-1 2025-12-04T11:13:45.3145781Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:45.3145948Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:45.3148197Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:45.3148414Z GITHUB_TOKEN: *** 2025-12-04T11:13:45.3148658Z JOB_NAME: linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:13:45.3148918Z PR_NUMBER: 2025-12-04T11:13:45.3149012Z TAG: 2025-12-04T11:13:45.3149102Z EVENT_NAME: schedule 2025-12-04T11:13:45.3149206Z SCHEDULE: 29 8 * * * 2025-12-04T11:13:45.3149308Z HEAD_BRANCH: main 2025-12-04T11:13:45.3149411Z ##[endgroup] 2025-12-04T11:13:45.3165489Z Workflow: periodic-rocm-mi300 2025-12-04T11:13:45.3165749Z Job name: linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:13:45.9001703Z Setting output keep-going=True 2025-12-04T11:13:45.9001975Z Setting output ci-verbose-test-logs=False 2025-12-04T11:13:45.9002230Z Setting output ci-test-showlocals=False 2025-12-04T11:13:45.9002467Z Setting output ci-no-test-timeout=False 2025-12-04T11:13:45.9002695Z Setting output ci-no-td=False 2025-12-04T11:13:45.9002909Z Setting output ci-td-distributed=False 2025-12-04T11:13:45.9003136Z Setting output is-unstable=False 2025-12-04T11:13:45.9003368Z Setting output reenabled-issues= 2025-12-04T11:13:45.9008268Z Setting output test-matrix={"include": [{"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T11:13:45.9013151Z Setting output is-test-matrix-empty=False 2025-12-04T11:13:45.9108227Z ##[group]Run echo "Filtered matrix:" 2025-12-04T11:13:45.9108472Z echo "Filtered matrix:" 2025-12-04T11:13:45.9112207Z echo "{"include": [{"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "owners": ["module:rocm", "oncall:distributed"], "rerun_disabled_tests": "rerun_disabled_tests"}]}" 2025-12-04T11:13:45.9115682Z  2025-12-04T11:13:45.9115787Z echo 2025-12-04T11:13:45.9115930Z echo "Is the current job unstable? False" 2025-12-04T11:13:45.9116090Z  2025-12-04T11:13:45.9116191Z echo 2025-12-04T11:13:45.9116314Z echo "Is keep-going label set? True" 2025-12-04T11:13:45.9116464Z  2025-12-04T11:13:45.9116567Z echo 2025-12-04T11:13:45.9116683Z echo "Reenabled issues? " 2025-12-04T11:13:45.9120962Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:45.9121108Z env: 2025-12-04T11:13:45.9121202Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:45.9121338Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:45.9121513Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:45.9121684Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:45.9122184Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:45.9122662Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:45.9122779Z AWS_REGION: us-east-1 2025-12-04T11:13:45.9122941Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:45.9123142Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:45.9125242Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:45.9125350Z ##[endgroup] 2025-12-04T11:13:45.9143801Z Filtered matrix: 2025-12-04T11:13:45.9147385Z {include: [{config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, owners: [module:rocm, oncall:distributed], rerun_disabled_tests: rerun_disabled_tests}]} 2025-12-04T11:13:45.9150573Z 2025-12-04T11:13:45.9150629Z Is the current job unstable? False 2025-12-04T11:13:45.9150713Z 2025-12-04T11:13:45.9150772Z Is keep-going label set? True 2025-12-04T11:13:45.9150850Z 2025-12-04T11:13:45.9150897Z Reenabled issues? 2025-12-04T11:13:45.9180623Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T11:13:45.9180846Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T11:13:45.9185352Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:45.9185506Z env: 2025-12-04T11:13:45.9185611Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:45.9185756Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:45.9185939Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:45.9186109Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:45.9186619Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:45.9187138Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:45.9187264Z AWS_REGION: us-east-1 2025-12-04T11:13:45.9187444Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:45.9187602Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:45.9189765Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:45.9189876Z JOB_TIMEOUT: 600 2025-12-04T11:13:45.9189982Z ##[endgroup] 2025-12-04T11:13:45.9242123Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:13:45.9242395Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:13:45.9242630Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T11:13:45.9247545Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T11:13:45.9247731Z env: 2025-12-04T11:13:45.9247871Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:45.9248039Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:45.9248266Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:45.9248481Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:45.9249122Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:45.9249925Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:45.9250076Z AWS_REGION: us-east-1 2025-12-04T11:13:45.9250289Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:45.9250485Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:45.9252760Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:45.9252873Z ##[endgroup] 2025-12-04T11:13:45.9331478Z ##[group]Run set -x 2025-12-04T11:13:45.9331623Z set -x 2025-12-04T11:13:45.9331753Z  2025-12-04T11:13:45.9331863Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T11:13:45.9332023Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T11:13:45.9332176Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T11:13:45.9332319Z  TEST_COMMAND=.ci/caffe2/test.sh 2025-12-04T11:13:45.9332439Z else 2025-12-04T11:13:45.9332543Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T11:13:45.9332661Z fi 2025-12-04T11:13:45.9332752Z  2025-12-04T11:13:45.9332888Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T11:13:45.9333099Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T11:13:45.9333284Z # Used for GPU_FLAG since that doesn't play nice 2025-12-04T11:13:45.9333451Z # shellcheck disable=SC2086,SC2090 2025-12-04T11:13:45.9333583Z container_name=$(docker run \ 2025-12-04T11:13:45.9333839Z  ${GPU_FLAG:-} \ 2025-12-04T11:13:45.9333953Z  -e BUILD_ENVIRONMENT \ 2025-12-04T11:13:45.9334074Z  -e PR_NUMBER \ 2025-12-04T11:13:45.9334185Z  -e GITHUB_ACTIONS \ 2025-12-04T11:13:45.9334302Z  -e GITHUB_REPOSITORY \ 2025-12-04T11:13:45.9334421Z  -e GITHUB_WORKFLOW \ 2025-12-04T11:13:45.9334536Z  -e GITHUB_JOB \ 2025-12-04T11:13:45.9334647Z  -e GITHUB_RUN_ID \ 2025-12-04T11:13:45.9334758Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T11:13:45.9334876Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T11:13:45.9334991Z  -e JOB_ID \ 2025-12-04T11:13:45.9335095Z  -e JOB_NAME \ 2025-12-04T11:13:45.9335201Z  -e BASE_SHA \ 2025-12-04T11:13:45.9335305Z  -e BRANCH \ 2025-12-04T11:13:45.9335407Z  -e SHA1 \ 2025-12-04T11:13:45.9335512Z  -e AWS_DEFAULT_REGION \ 2025-12-04T11:13:45.9335634Z  -e IN_WHEEL_TEST \ 2025-12-04T11:13:45.9335745Z  -e SHARD_NUMBER \ 2025-12-04T11:13:45.9335855Z  -e TEST_CONFIG \ 2025-12-04T11:13:45.9335965Z  -e NUM_TEST_SHARDS \ 2025-12-04T11:13:45.9347540Z  -e REENABLED_ISSUES \ 2025-12-04T11:13:45.9347703Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T11:13:45.9347838Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T11:13:45.9347973Z  -e TEST_SHOWLOCALS \ 2025-12-04T11:13:45.9348099Z  -e NO_TEST_TIMEOUT \ 2025-12-04T11:13:45.9348223Z  -e NO_TD \ 2025-12-04T11:13:45.9348354Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T11:13:45.9348512Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T11:13:45.9348657Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T11:13:45.9348801Z  -e TESTS_TO_INCLUDE \ 2025-12-04T11:13:45.9348931Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T11:13:45.9349060Z  -e DASHBOARD_TAG \ 2025-12-04T11:13:45.9349223Z  --env-file="${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T11:13:45.9349394Z  --ulimit stack=10485760:83886080 \ 2025-12-04T11:13:45.9349530Z  --ulimit core=0 \ 2025-12-04T11:13:45.9349679Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T11:13:45.9349882Z  --security-opt seccomp=unconfined \ 2025-12-04T11:13:45.9350028Z  --cap-add=SYS_PTRACE \ 2025-12-04T11:13:45.9350156Z  --shm-size="8g" \ 2025-12-04T11:13:45.9350272Z  --tty \ 2025-12-04T11:13:45.9350378Z  --detach \ 2025-12-04T11:13:45.9350494Z  --name="${container_name}" \ 2025-12-04T11:13:45.9350627Z  --user jenkins \ 2025-12-04T11:13:45.9350777Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T11:13:45.9350943Z  -w /var/lib/jenkins/workspace \ 2025-12-04T11:13:45.9351173Z  "${DOCKER_IMAGE}" 2025-12-04T11:13:45.9351290Z ) 2025-12-04T11:13:45.9351399Z # save container name for later step 2025-12-04T11:13:45.9351562Z echo "CONTAINER_NAME=${container_name}" >> "$GITHUB_ENV" 2025-12-04T11:13:45.9351836Z # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home 2025-12-04T11:13:45.9352185Z docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" 2025-12-04T11:13:45.9355050Z shell: /usr/bin/bash -e {0} 2025-12-04T11:13:45.9355165Z env: 2025-12-04T11:13:45.9355263Z GIT_DEFAULT_BRANCH: main 2025-12-04T11:13:45.9355410Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T11:13:45.9355595Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T11:13:45.9355766Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T11:13:45.9356283Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T11:13:45.9356833Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T11:13:45.9356958Z AWS_REGION: us-east-1 2025-12-04T11:13:45.9357100Z AWS_ACCESS_KEY_ID: *** 2025-12-04T11:13:45.9357260Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T11:13:45.9359412Z AWS_SESSION_TOKEN: *** 2025-12-04T11:13:45.9359550Z BUILD_ENVIRONMENT: linux-noble-rocm-py3.12-mi300 2025-12-04T11:13:45.9359733Z PR_NUMBER: 2025-12-04T11:13:45.9359843Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T11:13:45.9359979Z GITHUB_WORKFLOW: periodic-rocm-mi300 2025-12-04T11:13:45.9360102Z GITHUB_JOB: test 2025-12-04T11:13:45.9360207Z GITHUB_RUN_ID: 19922798714 2025-12-04T11:13:45.9360323Z GITHUB_RUN_NUMBER: 1861 2025-12-04T11:13:45.9360438Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T11:13:45.9360552Z JOB_ID: 57117547552 2025-12-04T11:13:45.9360793Z JOB_NAME: linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:13:45.9361047Z BRANCH: main 2025-12-04T11:13:45.9361161Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:45.9361314Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:45.9361453Z TEST_CONFIG: distributed 2025-12-04T11:13:45.9361558Z SHARD_NUMBER: 1 2025-12-04T11:13:45.9361660Z NUM_TEST_SHARDS: 3 2025-12-04T11:13:45.9361758Z REENABLED_ISSUES: 2025-12-04T11:13:45.9361865Z CONTINUE_THROUGH_ERROR: True 2025-12-04T11:13:45.9361979Z VERBOSE_TEST_LOGS: False 2025-12-04T11:13:45.9362086Z TEST_SHOWLOCALS: False 2025-12-04T11:13:45.9362192Z NO_TEST_TIMEOUT: False 2025-12-04T11:13:45.9362294Z NO_TD: False 2025-12-04T11:13:45.9362563Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:13:45.9362862Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T11:13:45.9362992Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T11:13:45.9363113Z TESTS_TO_INCLUDE: 2025-12-04T11:13:45.9363212Z DASHBOARD_TAG: 2025-12-04T11:13:45.9363357Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T11:13:45.9363473Z ##[endgroup] 2025-12-04T11:13:45.9376905Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2025-12-04T11:13:45.9377256Z + [[ linux-noble-rocm-py3.12-mi300 == *onnx* ]] 2025-12-04T11:13:45.9377529Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T11:13:45.9383211Z +++ nproc --ignore=2 2025-12-04T11:13:45.9392387Z ++ docker run --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e MAX_JOBS=126 -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e TESTS_TO_INCLUDE -e HUGGING_FACE_HUB_TOKEN -e DASHBOARD_TAG --env-file=/home/runner/_work/_temp/github_env_19922798714 --ulimit stack=10485760:83886080 --ulimit core=0 --env-file=/tmp/github_env_19922798714 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --shm-size=8g --tty --detach --name= --user jenkins -v /home/runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T11:13:46.1900431Z + container_name=80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T11:13:46.1900943Z + echo CONTAINER_NAME=80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T11:13:46.1901613Z + docker exec -t 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 sh -c 'cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && .ci/pytorch/test.sh' 2025-12-04T11:13:49.2624171Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp312-cp312-linux_x86_64.whl 2025-12-04T11:13:49.7800051Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T11:13:49.7801004Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T11:13:49.7802026Z Requirement already satisfied: setuptools in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (78.1.1) 2025-12-04T11:13:49.7802868Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T11:13:49.7804142Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T11:13:49.7805790Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T11:13:49.7807040Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T11:13:49.7853932Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T11:13:49.7872631Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T11:13:49.8974697Z Installing collected packages: torch 2025-12-04T11:13:55.3656238Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T11:13:55.4022686Z + export TERM=vt100 2025-12-04T11:13:55.4023971Z + TERM=vt100 2025-12-04T11:13:55.4027444Z ++ dirname .ci/pytorch/test.sh 2025-12-04T11:13:55.4034429Z + source .ci/pytorch/common.sh 2025-12-04T11:13:55.4038682Z +++ dirname .ci/pytorch/common.sh 2025-12-04T11:13:55.4047275Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T11:13:55.4049072Z +++ declare -f -t trap_add 2025-12-04T11:13:55.4054580Z ++ set -ex -o pipefail 2025-12-04T11:13:55.4054859Z ++ [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T11:13:55.4055127Z ++ unset HIP_PLATFORM 2025-12-04T11:13:55.4055345Z ++ export PYTORCH_TEST_WITH_ROCM=1 2025-12-04T11:13:55.4055590Z ++ PYTORCH_TEST_WITH_ROCM=1 2025-12-04T11:13:55.4055810Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T11:13:55.4060177Z ++ dirname .ci/pytorch/test.sh 2025-12-04T11:13:55.4066443Z + source .ci/pytorch/common-build.sh 2025-12-04T11:13:55.4068420Z ++ [[ linux-noble-rocm-py3.12-mi300 != *win-* ]] 2025-12-04T11:13:55.4077369Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T11:13:55.4086924Z +++ cd .ci/pytorch 2025-12-04T11:13:55.4087943Z +++ pwd -P 2025-12-04T11:13:55.4089758Z ++ script_dir=/var/lib/jenkins/pytorch/.ci/pytorch 2025-12-04T11:13:55.4090170Z ++ [[ linux-noble-rocm-py3.12-mi300 == *-pch* ]] 2025-12-04T11:13:55.4090472Z ++ which sccache 2025-12-04T11:13:55.4102482Z ++ [[ -z '' ]] 2025-12-04T11:13:55.4102674Z ++ unset SCCACHE_BUCKET 2025-12-04T11:13:55.4102873Z ++ unset SCCACHE_REGION 2025-12-04T11:13:55.4103072Z ++ sccache --stop-server 2025-12-04T11:13:55.4124916Z ++ true 2025-12-04T11:13:55.4125107Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T11:13:55.4131586Z ++ trap_add sccache_epilogue EXIT 2025-12-04T11:13:55.4131810Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T11:13:55.4132048Z ++ shift 2025-12-04T11:13:55.4132216Z ++ for trap_add_name in "$@" 2025-12-04T11:13:55.4136770Z ++++ trap -p EXIT 2025-12-04T11:13:55.4138974Z +++ eval 'extract_trap_cmd ' 2025-12-04T11:13:55.4139157Z ++++ extract_trap_cmd 2025-12-04T11:13:55.4139326Z ++++ printf '%s\n' '' 2025-12-04T11:13:55.4139495Z +++ printf '%s\n' sccache_epilogue 2025-12-04T11:13:55.4140627Z ++ trap -- ' 2025-12-04T11:13:55.4140811Z sccache_epilogue' EXIT 2025-12-04T11:13:55.4140968Z ++ [[ -n '' ]] 2025-12-04T11:13:55.4141141Z ++ [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T11:13:55.4141391Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T11:13:55.4141619Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T11:13:55.4141797Z ++ sccache --start-server 2025-12-04T11:13:55.4157919Z sccache: Starting the server... 2025-12-04T11:13:55.4357363Z sccache: Listening on address 127.0.0.1:4226 2025-12-04T11:13:55.4369832Z ++ sccache --zero-stats 2025-12-04T11:13:55.4381682Z Statistics zeroed. 2025-12-04T11:13:55.4382801Z ++ which ccache 2025-12-04T11:13:55.4390426Z + [[ linux-noble-rocm-py3.12-mi300 != *rocm* ]] 2025-12-04T11:13:55.4390643Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-12-04T11:13:55.4390830Z + echo 'Environment variables:' 2025-12-04T11:13:55.4390993Z Environment variables: 2025-12-04T11:13:55.4391133Z + env 2025-12-04T11:13:55.4397583Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch 2025-12-04T11:13:55.4397781Z CONTINUE_THROUGH_ERROR=True 2025-12-04T11:13:55.4397994Z BUILD_ENVIRONMENT=linux-noble-rocm-py3.12-mi300 2025-12-04T11:13:55.4398233Z HOSTNAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-rf5f6 2025-12-04T11:13:55.4398557Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4398813Z GITHUB_ACTION=__run_2 2025-12-04T11:13:55.4398956Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T11:13:55.4399114Z GITHUB_RUN_NUMBER=1861 2025-12-04T11:13:55.4399242Z TEST_CONFIG=distributed 2025-12-04T11:13:55.4399430Z RUNNER_NAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-rf5f6 2025-12-04T11:13:55.4399636Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T11:13:55.4399841Z AWS_DEFAULT_REGION=us-east-1 2025-12-04T11:13:55.4400027Z RUNNER_ARTIFACT_DIR=/home/runner/_work/_temp/artifacts 2025-12-04T11:13:55.4400229Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-12-04T11:13:55.4400393Z GITHUB_REF_TYPE=branch 2025-12-04T11:13:55.4400550Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4400861Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T11:13:55.4401320Z *** 2025-12-04T11:13:55.4401441Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T11:13:55.4401590Z GITHUB_ACTIONS=true 2025-12-04T11:13:55.4401741Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4401936Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4402266Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic-rocm-mi300.yml@refs/heads/main 2025-12-04T11:13:55.4402528Z UCC_HOME=/usr 2025-12-04T11:13:55.4402660Z RUNNER_ENVIRONMENT=self-hosted 2025-12-04T11:13:55.4403004Z VERBOSE_TEST_LOGS=False 2025-12-04T11:13:55.4403151Z GITHUB_REF=refs/heads/main 2025-12-04T11:13:55.4403284Z RUNNER_OS=Linux 2025-12-04T11:13:55.4403409Z SHARD_NUMBER=1 2025-12-04T11:13:55.4403531Z GITHUB_REF_PROTECTED=true 2025-12-04T11:13:55.4403679Z RUNNER_MANUALLY_TRAP_SIG=1 2025-12-04T11:13:55.4403816Z HOME=/var/lib/jenkins 2025-12-04T11:13:55.4403968Z GITHUB_API_URL=https://api.github.com 2025-12-04T11:13:55.4404141Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T11:13:55.4404311Z RUNNER_DOCS_DIR=/home/runner/_work/_temp/docs 2025-12-04T11:13:55.4404486Z LANG=C.UTF-8 2025-12-04T11:13:55.4404624Z UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e 2025-12-04T11:13:55.4404806Z PYTORCH_TEST_WITH_ROCM=1 2025-12-04T11:13:55.4404983Z RUNNER_TRACKING_ID=github_6d7288f6-9c7a-4580-a89e-41c02329dd02 2025-12-04T11:13:55.4405177Z RUNNER_ARCH=X64 2025-12-04T11:13:55.4405312Z RUNNER_TEMP=/home/runner/_work/_temp 2025-12-04T11:13:55.4405464Z NUM_TEST_SHARDS=3 2025-12-04T11:13:55.4405580Z UCX_HOME=/usr 2025-12-04T11:13:55.4405826Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4406334Z JOB_NAME=linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:13:55.4406662Z MAGMA_HOME=/opt/rocm/magma 2025-12-04T11:13:55.4406911Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4407224Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json 2025-12-04T11:13:55.4407434Z GITHUB_EVENT_NAME=schedule 2025-12-04T11:13:55.4407637Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.12.1 2025-12-04T11:13:55.4407849Z DASHBOARD_TAG= 2025-12-04T11:13:55.4407976Z GITHUB_RUN_ID=19922798714 2025-12-04T11:13:55.4408248Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4408542Z GITHUB_ACTOR=pytorchmergebot 2025-12-04T11:13:55.4408690Z PR_NUMBER= 2025-12-04T11:13:55.4408805Z GITHUB_RUN_ATTEMPT=1 2025-12-04T11:13:55.4408947Z ANACONDA_PYTHON_VERSION=3.12 2025-12-04T11:13:55.4409117Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T11:13:55.4409289Z TERM=vt100 2025-12-04T11:13:55.4409408Z INSTALLED_VISION=yes 2025-12-04T11:13:55.4409530Z BRANCH=main 2025-12-04T11:13:55.4409643Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T11:13:55.4409808Z TESTS_TO_INCLUDE= 2025-12-04T11:13:55.4409982Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-12-04T11:13:55.4410188Z GITHUB_SERVER_URL=https://github.com 2025-12-04T11:13:55.4410337Z PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100 2025-12-04T11:13:55.4410500Z UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77 2025-12-04T11:13:55.4410646Z REENABLED_ISSUES= 2025-12-04T11:13:55.4410748Z SHLVL=1 2025-12-04T11:13:55.4410841Z MAX_JOBS=126 2025-12-04T11:13:55.4410982Z RUNNER_TEST_RESULTS_DIR=/home/runner/_work/_temp/test-results 2025-12-04T11:13:55.4411149Z GITHUB_ACTOR_ID=97764156 2025-12-04T11:13:55.4411276Z RUNNER_TOOL_CACHE=/home/runner/_work/_tool 2025-12-04T11:13:55.4411450Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4411614Z GITHUB_REF_NAME=main 2025-12-04T11:13:55.4411723Z ROCM_PATH=/opt/rocm 2025-12-04T11:13:55.4411826Z GITHUB_JOB=test 2025-12-04T11:13:55.4411930Z NO_TEST_TIMEOUT=False 2025-12-04T11:13:55.4412051Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T11:13:55.4412178Z LC_ALL=C.UTF-8 2025-12-04T11:13:55.4412285Z GITHUB_RETENTION_DAYS=90 2025-12-04T11:13:55.4412413Z RUNNER_WORKSPACE=/home/runner/_work/pytorch 2025-12-04T11:13:55.4412552Z OPENSSL_DIR=/opt/openssl 2025-12-04T11:13:55.4412670Z GITHUB_ACTION_REPOSITORY= 2025-12-04T11:13:55.4413049Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T11:13:55.4413476Z GITHUB_BASE_REF= 2025-12-04T11:13:55.4413580Z CI=true 2025-12-04T11:13:55.4413703Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T11:13:55.4413826Z JOB_ID=57117547552 2025-12-04T11:13:55.4413928Z GITHUB_HEAD_REF= 2025-12-04T11:13:55.4414031Z GITHUB_ACTION_REF= 2025-12-04T11:13:55.4414136Z TEST_SHOWLOCALS=False 2025-12-04T11:13:55.4414259Z GITHUB_WORKFLOW=periodic-rocm-mi300 2025-12-04T11:13:55.4414397Z DEBIAN_FRONTEND=noninteractive 2025-12-04T11:13:55.4414621Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4414842Z NO_TD=False 2025-12-04T11:13:55.4414942Z OLDPWD=/var/lib/jenkins 2025-12-04T11:13:55.4415053Z _=/usr/bin/env 2025-12-04T11:13:55.4415204Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T11:13:55.4473531Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch 2025-12-04T11:13:55.4473811Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/bin 2025-12-04T11:13:55.4474028Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/lib 2025-12-04T11:13:55.4474323Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/test 2025-12-04T11:13:55.4474490Z + BUILD_DIR=build 2025-12-04T11:13:55.4474594Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T11:13:55.4474715Z + BUILD_BIN_DIR=build/bin 2025-12-04T11:13:55.4474820Z + SHARD_NUMBER=1 2025-12-04T11:13:55.4474915Z + NUM_TEST_SHARDS=3 2025-12-04T11:13:55.4475020Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T11:13:55.4475153Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T11:13:55.4475266Z + export VALGRIND=ON 2025-12-04T11:13:55.4475365Z + VALGRIND=ON 2025-12-04T11:13:55.4475477Z + [[ linux-noble-rocm-py3.12-mi300 == *clang9* ]] 2025-12-04T11:13:55.4475625Z + [[ linux-noble-rocm-py3.12-mi300 == *xpu* ]] 2025-12-04T11:13:55.4475751Z + detect_cuda_arch 2025-12-04T11:13:55.4475861Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-12-04T11:13:55.4476011Z + [[ linux-noble-rocm-py3.12-mi300 == *s390x* ]] 2025-12-04T11:13:55.4476139Z + [[ 0 == \1 ]] 2025-12-04T11:13:55.4476229Z + [[ True == \1 ]] 2025-12-04T11:13:55.4476340Z + [[ linux-noble-rocm-py3.12-mi300 != *bazel* ]] 2025-12-04T11:13:55.4480499Z ++ realpath build/custom_test_artifacts 2025-12-04T11:13:55.4490912Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/pytorch/build/custom_test_artifacts 2025-12-04T11:13:55.4491099Z + [[ -n '' ]] 2025-12-04T11:13:55.4491220Z + echo 'Environment variables' 2025-12-04T11:13:55.4491335Z Environment variables 2025-12-04T11:13:55.4491439Z + env 2025-12-04T11:13:55.4502090Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch 2025-12-04T11:13:55.4502470Z CONTINUE_THROUGH_ERROR=True 2025-12-04T11:13:55.4502755Z BUILD_ENVIRONMENT=linux-noble-rocm-py3.12-mi300 2025-12-04T11:13:55.4503103Z HOSTNAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-rf5f6 2025-12-04T11:13:55.4503586Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4504014Z GITHUB_ACTION=__run_2 2025-12-04T11:13:55.4504243Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T11:13:55.4504487Z GITHUB_RUN_NUMBER=1861 2025-12-04T11:13:55.4504692Z TEST_CONFIG=distributed 2025-12-04T11:13:55.4504975Z RUNNER_NAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-rf5f6 2025-12-04T11:13:55.4505294Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T11:13:55.4505548Z AWS_DEFAULT_REGION=us-east-1 2025-12-04T11:13:55.4505825Z RUNNER_ARTIFACT_DIR=/home/runner/_work/_temp/artifacts 2025-12-04T11:13:55.4506120Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-12-04T11:13:55.4506364Z GITHUB_REF_TYPE=branch 2025-12-04T11:13:55.4506609Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4507002Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T11:13:55.4507269Z *** 2025-12-04T11:13:55.4507452Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T11:13:55.4507676Z GITHUB_ACTIONS=true 2025-12-04T11:13:55.4507904Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4508448Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4508850Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic-rocm-mi300.yml@refs/heads/main 2025-12-04T11:13:55.4509204Z UCC_HOME=/usr 2025-12-04T11:13:55.4509388Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T11:13:55.4509598Z RUNNER_ENVIRONMENT=self-hosted 2025-12-04T11:13:55.4509852Z VERBOSE_TEST_LOGS=False 2025-12-04T11:13:55.4510040Z GITHUB_REF=refs/heads/main 2025-12-04T11:13:55.4510223Z RUNNER_OS=Linux 2025-12-04T11:13:55.4510390Z SHARD_NUMBER=1 2025-12-04T11:13:55.4510566Z GITHUB_REF_PROTECTED=true 2025-12-04T11:13:55.4510764Z RUNNER_MANUALLY_TRAP_SIG=1 2025-12-04T11:13:55.4510952Z HOME=/var/lib/jenkins 2025-12-04T11:13:55.4511159Z GITHUB_API_URL=https://api.github.com 2025-12-04T11:13:55.4511399Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T11:13:55.4511638Z RUNNER_DOCS_DIR=/home/runner/_work/_temp/docs 2025-12-04T11:13:55.4511860Z LANG=C.UTF-8 2025-12-04T11:13:55.4512064Z UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e 2025-12-04T11:13:55.4512315Z PYTORCH_TEST_WITH_ROCM=1 2025-12-04T11:13:55.4512644Z RUNNER_TRACKING_ID=github_6d7288f6-9c7a-4580-a89e-41c02329dd02 2025-12-04T11:13:55.4512911Z RUNNER_ARCH=X64 2025-12-04T11:13:55.4513093Z RUNNER_TEMP=/home/runner/_work/_temp 2025-12-04T11:13:55.4513303Z NUM_TEST_SHARDS=3 2025-12-04T11:13:55.4513471Z UCX_HOME=/usr 2025-12-04T11:13:55.4513807Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4514437Z JOB_NAME=linux-noble-rocm-py3.12-mi300 / test (distributed, 1, 3, linux.rocm.gpu.gfx942.4.b, module:rocm, oncall:distributed, mem_leak_check) 2025-12-04T11:13:55.4514889Z MAGMA_HOME=/opt/rocm/magma 2025-12-04T11:13:55.4515234Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4515668Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json 2025-12-04T11:13:55.4515960Z GITHUB_EVENT_NAME=schedule 2025-12-04T11:13:55.4516245Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.12.1 2025-12-04T11:13:55.4516537Z DASHBOARD_TAG= 2025-12-04T11:13:55.4516707Z GITHUB_RUN_ID=19922798714 2025-12-04T11:13:55.4517083Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4517486Z GITHUB_ACTOR=pytorchmergebot 2025-12-04T11:13:55.4517681Z PR_NUMBER= 2025-12-04T11:13:55.4517845Z GITHUB_RUN_ATTEMPT=1 2025-12-04T11:13:55.4518017Z VALGRIND=ON 2025-12-04T11:13:55.4518183Z ANACONDA_PYTHON_VERSION=3.12 2025-12-04T11:13:55.4518418Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T11:13:55.4518589Z TERM=vt100 2025-12-04T11:13:55.4518712Z INSTALLED_VISION=yes 2025-12-04T11:13:55.4518837Z BRANCH=main 2025-12-04T11:13:55.4518973Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T11:13:55.4519111Z TESTS_TO_INCLUDE= 2025-12-04T11:13:55.4519328Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-12-04T11:13:55.4519584Z GITHUB_SERVER_URL=https://github.com 2025-12-04T11:13:55.4519902Z PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100 2025-12-04T11:13:55.4520112Z UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77 2025-12-04T11:13:55.4520287Z REENABLED_ISSUES= 2025-12-04T11:13:55.4520413Z SHLVL=1 2025-12-04T11:13:55.4520524Z MAX_JOBS=126 2025-12-04T11:13:55.4520691Z RUNNER_TEST_RESULTS_DIR=/home/runner/_work/_temp/test-results 2025-12-04T11:13:55.4520895Z GITHUB_ACTOR_ID=97764156 2025-12-04T11:13:55.4521048Z RUNNER_TOOL_CACHE=/home/runner/_work/_tool 2025-12-04T11:13:55.4521259Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T11:13:55.4521455Z GITHUB_REF_NAME=main 2025-12-04T11:13:55.4521592Z ROCM_PATH=/opt/rocm 2025-12-04T11:13:55.4521712Z GITHUB_JOB=test 2025-12-04T11:13:55.4521844Z NO_TEST_TIMEOUT=False 2025-12-04T11:13:55.4521991Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T11:13:55.4522145Z LC_ALL=C.UTF-8 2025-12-04T11:13:55.4522268Z GITHUB_RETENTION_DAYS=90 2025-12-04T11:13:55.4522481Z RUNNER_WORKSPACE=/home/runner/_work/pytorch 2025-12-04T11:13:55.4522660Z OPENSSL_DIR=/opt/openssl 2025-12-04T11:13:55.4522802Z GITHUB_ACTION_REPOSITORY= 2025-12-04T11:13:55.4523266Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T11:13:55.4523729Z GITHUB_BASE_REF= 2025-12-04T11:13:55.4523852Z CI=true 2025-12-04T11:13:55.4523979Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T11:13:55.4524126Z JOB_ID=57117547552 2025-12-04T11:13:55.4524252Z GITHUB_HEAD_REF= 2025-12-04T11:13:55.4524373Z GITHUB_ACTION_REF= 2025-12-04T11:13:55.4524507Z TEST_SHOWLOCALS=False 2025-12-04T11:13:55.4524653Z GITHUB_WORKFLOW=periodic-rocm-mi300 2025-12-04T11:13:55.4524818Z DEBIAN_FRONTEND=noninteractive 2025-12-04T11:13:55.4525084Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_a9668506-00d1-4e27-bbf1-4f523f8d7ef5 2025-12-04T11:13:55.4525353Z NO_TD=False 2025-12-04T11:13:55.4525482Z OLDPWD=/var/lib/jenkins 2025-12-04T11:13:55.4525652Z _=/usr/bin/env 2025-12-04T11:13:55.4525785Z + echo 'Testing pytorch' 2025-12-04T11:13:55.4525913Z Testing pytorch 2025-12-04T11:13:55.4526045Z + export LANG=C.UTF-8 2025-12-04T11:13:55.4526170Z + LANG=C.UTF-8 2025-12-04T11:13:55.4526297Z + PR_NUMBER= 2025-12-04T11:13:55.4526422Z + [[ distributed == \d\e\f\a\u\l\t ]] 2025-12-04T11:13:55.4526596Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T11:13:55.4526778Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T11:13:55.4526964Z + export HIP_VISIBLE_DEVICES=0,1,2,3 2025-12-04T11:13:55.4527127Z + HIP_VISIBLE_DEVICES=0,1,2,3 2025-12-04T11:13:55.4527279Z + [[ distributed == \s\l\o\w ]] 2025-12-04T11:13:55.4527469Z + [[ linux-noble-rocm-py3.12-mi300 == *slow-gradcheck* ]] 2025-12-04T11:13:55.4527676Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-12-04T11:13:55.4527857Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T11:13:55.4528050Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T11:13:55.4528230Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T11:13:55.4528402Z + [[ distributed == *crossref* ]] 2025-12-04T11:13:55.4528568Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T11:13:55.4528735Z + export VALGRIND=OFF 2025-12-04T11:13:55.4528870Z + VALGRIND=OFF 2025-12-04T11:13:55.4528984Z + rocminfo 2025-12-04T11:13:55.4628775Z ROCk module version 6.12.12 is loaded 2025-12-04T11:13:55.5366019Z ===================== 2025-12-04T11:13:55.5366205Z HSA System Attributes 2025-12-04T11:13:55.5366361Z ===================== 2025-12-04T11:13:55.5366508Z Runtime Version: 1.18 2025-12-04T11:13:55.5366655Z Runtime Ext Version: 1.14 2025-12-04T11:13:55.5366810Z System Timestamp Freq.: 1000.000000MHz 2025-12-04T11:13:55.5367055Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-12-04T11:13:55.5367322Z Machine Model: LARGE 2025-12-04T11:13:55.5367541Z System Endianness: LITTLE 2025-12-04T11:13:55.5367729Z Mwaitx: DISABLED 2025-12-04T11:13:55.5367890Z XNACK enabled: NO 2025-12-04T11:13:55.5368197Z DMAbuf Support: YES 2025-12-04T11:13:55.5368341Z VMM Support: YES 2025-12-04T11:13:55.5368433Z 2025-12-04T11:13:55.5368542Z ========== 2025-12-04T11:13:55.5368681Z HSA Agents 2025-12-04T11:13:55.5368841Z ========== 2025-12-04T11:13:55.5368983Z ******* 2025-12-04T11:13:55.5369106Z Agent 1 2025-12-04T11:13:55.5369230Z ******* 2025-12-04T11:13:55.5369390Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:13:55.5369584Z Uuid: CPU-XX 2025-12-04T11:13:55.5369930Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:13:55.5370234Z Vendor Name: CPU 2025-12-04T11:13:55.5370434Z Feature: None specified 2025-12-04T11:13:55.5370636Z Profile: FULL_PROFILE 2025-12-04T11:13:55.5370836Z Float Round Mode: NEAR 2025-12-04T11:13:55.5371051Z Max Queue Number: 0(0x0) 2025-12-04T11:13:55.5371253Z Queue Min Size: 0(0x0) 2025-12-04T11:13:55.5371452Z Queue Max Size: 0(0x0) 2025-12-04T11:13:55.5371646Z Queue Type: MULTI 2025-12-04T11:13:55.5371830Z Node: 0 2025-12-04T11:13:55.5372018Z Device Type: CPU 2025-12-04T11:13:55.5372192Z Cache Info: 2025-12-04T11:13:55.5372345Z L1: 49152(0xc000) KB 2025-12-04T11:13:55.5372534Z Chip ID: 0(0x0) 2025-12-04T11:13:55.5372776Z ASIC Revision: 0(0x0) 2025-12-04T11:13:55.5372974Z Cacheline Size: 64(0x40) 2025-12-04T11:13:55.5373171Z Max Clock Freq. (MHz): 3300 2025-12-04T11:13:55.5373360Z BDFID: 0 2025-12-04T11:13:55.5373560Z Internal Node ID: 0 2025-12-04T11:13:55.5373761Z Compute Unit: 64 2025-12-04T11:13:55.5373953Z SIMDs per CU: 0 2025-12-04T11:13:55.5374153Z Shader Engines: 0 2025-12-04T11:13:55.5374357Z Shader Arrs. per Eng.: 0 2025-12-04T11:13:55.5374570Z WatchPts on Addr. Ranges:1 2025-12-04T11:13:55.5374753Z Memory Properties: 2025-12-04T11:13:55.5374896Z Features: None 2025-12-04T11:13:55.5375041Z Pool Info: 2025-12-04T11:13:55.5375177Z Pool 1 2025-12-04T11:13:55.5375354Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:13:55.5375555Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:13:55.5375765Z Allocatable: TRUE 2025-12-04T11:13:55.5375970Z Alloc Granule: 4KB 2025-12-04T11:13:55.5376183Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5376404Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5376613Z Accessible by all: TRUE 2025-12-04T11:13:55.5376792Z Pool 2 2025-12-04T11:13:55.5376962Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:13:55.5377161Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:13:55.5377356Z Allocatable: TRUE 2025-12-04T11:13:55.5377559Z Alloc Granule: 4KB 2025-12-04T11:13:55.5377768Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5377981Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5378185Z Accessible by all: TRUE 2025-12-04T11:13:55.5378362Z Pool 3 2025-12-04T11:13:55.5378529Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T11:13:55.5378723Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:13:55.5378911Z Allocatable: TRUE 2025-12-04T11:13:55.5379111Z Alloc Granule: 4KB 2025-12-04T11:13:55.5379359Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5379541Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5379751Z Accessible by all: TRUE 2025-12-04T11:13:55.5379897Z Pool 4 2025-12-04T11:13:55.5380032Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:13:55.5380187Z Size: 1584733176(0x5e751bf8) KB 2025-12-04T11:13:55.5380339Z Allocatable: TRUE 2025-12-04T11:13:55.5380502Z Alloc Granule: 4KB 2025-12-04T11:13:55.5380671Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5380843Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5381011Z Accessible by all: TRUE 2025-12-04T11:13:55.5381161Z ISA Info: 2025-12-04T11:13:55.5381307Z ******* 2025-12-04T11:13:55.5381416Z Agent 2 2025-12-04T11:13:55.5381520Z ******* 2025-12-04T11:13:55.5381646Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:13:55.5381799Z Uuid: CPU-XX 2025-12-04T11:13:55.5381963Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:13:55.5382132Z Vendor Name: CPU 2025-12-04T11:13:55.5382293Z Feature: None specified 2025-12-04T11:13:55.5382453Z Profile: FULL_PROFILE 2025-12-04T11:13:55.5382618Z Float Round Mode: NEAR 2025-12-04T11:13:55.5382784Z Max Queue Number: 0(0x0) 2025-12-04T11:13:55.5382950Z Queue Min Size: 0(0x0) 2025-12-04T11:13:55.5383111Z Queue Max Size: 0(0x0) 2025-12-04T11:13:55.5383274Z Queue Type: MULTI 2025-12-04T11:13:55.5383423Z Node: 1 2025-12-04T11:13:55.5383574Z Device Type: CPU 2025-12-04T11:13:55.5383715Z Cache Info: 2025-12-04T11:13:55.5383842Z L1: 49152(0xc000) KB 2025-12-04T11:13:55.5383987Z Chip ID: 0(0x0) 2025-12-04T11:13:55.5384140Z ASIC Revision: 0(0x0) 2025-12-04T11:13:55.5384301Z Cacheline Size: 64(0x40) 2025-12-04T11:13:55.5384465Z Max Clock Freq. (MHz): 3300 2025-12-04T11:13:55.5384622Z BDFID: 0 2025-12-04T11:13:55.5384782Z Internal Node ID: 1 2025-12-04T11:13:55.5384950Z Compute Unit: 64 2025-12-04T11:13:55.5385106Z SIMDs per CU: 0 2025-12-04T11:13:55.5385270Z Shader Engines: 0 2025-12-04T11:13:55.5385435Z Shader Arrs. per Eng.: 0 2025-12-04T11:13:55.5385603Z WatchPts on Addr. Ranges:1 2025-12-04T11:13:55.5385752Z Memory Properties: 2025-12-04T11:13:55.5385870Z Features: None 2025-12-04T11:13:55.5385984Z Pool Info: 2025-12-04T11:13:55.5386095Z Pool 1 2025-12-04T11:13:55.5386231Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:13:55.5386391Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:13:55.5386594Z Allocatable: TRUE 2025-12-04T11:13:55.5386759Z Alloc Granule: 4KB 2025-12-04T11:13:55.5386934Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5387105Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5387275Z Accessible by all: TRUE 2025-12-04T11:13:55.5387419Z Pool 2 2025-12-04T11:13:55.5387557Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:13:55.5387716Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:13:55.5387873Z Allocatable: TRUE 2025-12-04T11:13:55.5388036Z Alloc Granule: 4KB 2025-12-04T11:13:55.5388207Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5388379Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5388546Z Accessible by all: TRUE 2025-12-04T11:13:55.5388717Z Pool 3 2025-12-04T11:13:55.5388855Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T11:13:55.5389011Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:13:55.5389167Z Allocatable: TRUE 2025-12-04T11:13:55.5389329Z Alloc Granule: 4KB 2025-12-04T11:13:55.5389494Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5389660Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5389858Z Accessible by all: TRUE 2025-12-04T11:13:55.5389997Z Pool 4 2025-12-04T11:13:55.5390127Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:13:55.5390279Z Size: 1585355620(0x5e7e9b64) KB 2025-12-04T11:13:55.5390429Z Allocatable: TRUE 2025-12-04T11:13:55.5390587Z Alloc Granule: 4KB 2025-12-04T11:13:55.5390753Z Alloc Recommended Granule:4KB 2025-12-04T11:13:55.5390918Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5391081Z Accessible by all: TRUE 2025-12-04T11:13:55.5391223Z ISA Info: 2025-12-04T11:13:55.5391328Z ******* 2025-12-04T11:13:55.5391430Z Agent 3 2025-12-04T11:13:55.5391529Z ******* 2025-12-04T11:13:55.5391644Z Name: gfx942 2025-12-04T11:13:55.5391790Z Uuid: GPU-fc3883f959874ad9 2025-12-04T11:13:55.5391949Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.5392110Z Vendor Name: AMD 2025-12-04T11:13:55.5392268Z Feature: KERNEL_DISPATCH 2025-12-04T11:13:55.5392424Z Profile: BASE_PROFILE 2025-12-04T11:13:55.5392579Z Float Round Mode: NEAR 2025-12-04T11:13:55.5392736Z Max Queue Number: 128(0x80) 2025-12-04T11:13:55.5392889Z Queue Min Size: 64(0x40) 2025-12-04T11:13:55.5393043Z Queue Max Size: 131072(0x20000) 2025-12-04T11:13:55.5393194Z Queue Type: MULTI 2025-12-04T11:13:55.5393339Z Node: 2 2025-12-04T11:13:55.5393484Z Device Type: GPU 2025-12-04T11:13:55.5393619Z Cache Info: 2025-12-04T11:13:55.5393781Z L1: 32(0x20) KB 2025-12-04T11:13:55.5393921Z L2: 4096(0x1000) KB 2025-12-04T11:13:55.5394054Z L3: 262144(0x40000) KB 2025-12-04T11:13:55.5394265Z Chip ID: 29861(0x74a5) 2025-12-04T11:13:55.5394432Z ASIC Revision: 1(0x1) 2025-12-04T11:13:55.5394595Z Cacheline Size: 128(0x80) 2025-12-04T11:13:55.5394758Z Max Clock Freq. (MHz): 2100 2025-12-04T11:13:55.5394913Z BDFID: 29952 2025-12-04T11:13:55.5395070Z Internal Node ID: 2 2025-12-04T11:13:55.5395240Z Compute Unit: 304 2025-12-04T11:13:55.5395469Z SIMDs per CU: 4 2025-12-04T11:13:55.5395633Z Shader Engines: 32 2025-12-04T11:13:55.5395866Z Shader Arrs. per Eng.: 1 2025-12-04T11:13:55.5396096Z WatchPts on Addr. Ranges:4 2025-12-04T11:13:55.5396258Z Coherent Host Access: FALSE 2025-12-04T11:13:55.5396517Z Memory Properties: 2025-12-04T11:13:55.5396646Z Features: KERNEL_DISPATCH 2025-12-04T11:13:55.5396826Z Fast F16 Operation: TRUE 2025-12-04T11:13:55.5396993Z Wavefront Size: 64(0x40) 2025-12-04T11:13:55.5397157Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5397311Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5397514Z x 1024(0x400) 2025-12-04T11:13:55.5397703Z y 1024(0x400) 2025-12-04T11:13:55.5397851Z z 1024(0x400) 2025-12-04T11:13:55.5398025Z Max Waves Per CU: 32(0x20) 2025-12-04T11:13:55.5398188Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:13:55.5398367Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5398514Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5398680Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5398815Z y 65535(0xffff) 2025-12-04T11:13:55.5399017Z z 65535(0xffff) 2025-12-04T11:13:55.5399171Z Max fbarriers/Workgrp: 32 2025-12-04T11:13:55.5399403Z Packet Processor uCode:: 185 2025-12-04T11:13:55.5399573Z SDMA engine uCode:: 24 2025-12-04T11:13:55.5399828Z IOMMU Support:: None 2025-12-04T11:13:55.5399976Z Pool Info: 2025-12-04T11:13:55.5400085Z Pool 1 2025-12-04T11:13:55.5400224Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:13:55.5400383Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5400596Z Allocatable: TRUE 2025-12-04T11:13:55.5400763Z Alloc Granule: 4KB 2025-12-04T11:13:55.5400934Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5412057Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5412244Z Accessible by all: FALSE 2025-12-04T11:13:55.5412389Z Pool 2 2025-12-04T11:13:55.5412527Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:13:55.5412769Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5413020Z Allocatable: TRUE 2025-12-04T11:13:55.5413186Z Alloc Granule: 4KB 2025-12-04T11:13:55.5413346Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5413507Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5413664Z Accessible by all: FALSE 2025-12-04T11:13:55.5413802Z Pool 3 2025-12-04T11:13:55.5413930Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:13:55.5414080Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5414226Z Allocatable: TRUE 2025-12-04T11:13:55.5414379Z Alloc Granule: 4KB 2025-12-04T11:13:55.5414542Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5414742Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5414899Z Accessible by all: FALSE 2025-12-04T11:13:55.5415036Z Pool 4 2025-12-04T11:13:55.5415157Z Segment: GROUP 2025-12-04T11:13:55.5415370Z Size: 64(0x40) KB 2025-12-04T11:13:55.5415514Z Allocatable: FALSE 2025-12-04T11:13:55.5415667Z Alloc Granule: 0KB 2025-12-04T11:13:55.5415836Z Alloc Recommended Granule:0KB 2025-12-04T11:13:55.5415996Z Alloc Alignment: 0KB 2025-12-04T11:13:55.5416153Z Accessible by all: FALSE 2025-12-04T11:13:55.5416291Z ISA Info: 2025-12-04T11:13:55.5416396Z ISA 1 2025-12-04T11:13:55.5416533Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:13:55.5416698Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5416866Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5417033Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5417202Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5417398Z Fast f16: TRUE 2025-12-04T11:13:55.5417563Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5417710Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5417844Z x 1024(0x400) 2025-12-04T11:13:55.5417982Z y 1024(0x400) 2025-12-04T11:13:55.5418120Z z 1024(0x400) 2025-12-04T11:13:55.5418271Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5418416Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5418546Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5418735Z y 65535(0xffff) 2025-12-04T11:13:55.5418875Z z 65535(0xffff) 2025-12-04T11:13:55.5419027Z FBarrier Max Size: 32 2025-12-04T11:13:55.5419170Z ISA 2 2025-12-04T11:13:55.5419319Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:13:55.5419599Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5419862Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5420074Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5420334Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5420491Z Fast f16: TRUE 2025-12-04T11:13:55.5420649Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5420802Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5420936Z x 1024(0x400) 2025-12-04T11:13:55.5421083Z y 1024(0x400) 2025-12-04T11:13:55.5421216Z z 1024(0x400) 2025-12-04T11:13:55.5421365Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5421510Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5421636Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5421775Z y 65535(0xffff) 2025-12-04T11:13:55.5421942Z z 65535(0xffff) 2025-12-04T11:13:55.5422091Z FBarrier Max Size: 32 2025-12-04T11:13:55.5422232Z ******* 2025-12-04T11:13:55.5422341Z Agent 4 2025-12-04T11:13:55.5422444Z ******* 2025-12-04T11:13:55.5422574Z Name: gfx942 2025-12-04T11:13:55.5422811Z Uuid: GPU-cc3748ee0baeca85 2025-12-04T11:13:55.5422998Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.5423268Z Vendor Name: AMD 2025-12-04T11:13:55.5423436Z Feature: KERNEL_DISPATCH 2025-12-04T11:13:55.5423603Z Profile: BASE_PROFILE 2025-12-04T11:13:55.5423799Z Float Round Mode: NEAR 2025-12-04T11:13:55.5423960Z Max Queue Number: 128(0x80) 2025-12-04T11:13:55.5424123Z Queue Min Size: 64(0x40) 2025-12-04T11:13:55.5424287Z Queue Max Size: 131072(0x20000) 2025-12-04T11:13:55.5424455Z Queue Type: MULTI 2025-12-04T11:13:55.5424605Z Node: 3 2025-12-04T11:13:55.5424759Z Device Type: GPU 2025-12-04T11:13:55.5424900Z Cache Info: 2025-12-04T11:13:55.5425031Z L1: 32(0x20) KB 2025-12-04T11:13:55.5425173Z L2: 4096(0x1000) KB 2025-12-04T11:13:55.5425317Z L3: 262144(0x40000) KB 2025-12-04T11:13:55.5425461Z Chip ID: 29861(0x74a5) 2025-12-04T11:13:55.5425618Z ASIC Revision: 1(0x1) 2025-12-04T11:13:55.5425791Z Cacheline Size: 128(0x80) 2025-12-04T11:13:55.5425957Z Max Clock Freq. (MHz): 2100 2025-12-04T11:13:55.5426120Z BDFID: 1280 2025-12-04T11:13:55.5426281Z Internal Node ID: 3 2025-12-04T11:13:55.5426441Z Compute Unit: 304 2025-12-04T11:13:55.5426599Z SIMDs per CU: 4 2025-12-04T11:13:55.5426760Z Shader Engines: 32 2025-12-04T11:13:55.5426928Z Shader Arrs. per Eng.: 1 2025-12-04T11:13:55.5427102Z WatchPts on Addr. Ranges:4 2025-12-04T11:13:55.5427280Z Coherent Host Access: FALSE 2025-12-04T11:13:55.5427468Z Memory Properties: 2025-12-04T11:13:55.5427601Z Features: KERNEL_DISPATCH 2025-12-04T11:13:55.5427754Z Fast F16 Operation: TRUE 2025-12-04T11:13:55.5427922Z Wavefront Size: 64(0x40) 2025-12-04T11:13:55.5428092Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5428250Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5428389Z x 1024(0x400) 2025-12-04T11:13:55.5428531Z y 1024(0x400) 2025-12-04T11:13:55.5428671Z z 1024(0x400) 2025-12-04T11:13:55.5428826Z Max Waves Per CU: 32(0x20) 2025-12-04T11:13:55.5428995Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:13:55.5429166Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5429322Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5429483Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5429627Z y 65535(0xffff) 2025-12-04T11:13:55.5429804Z z 65535(0xffff) 2025-12-04T11:13:55.5429959Z Max fbarriers/Workgrp: 32 2025-12-04T11:13:55.5430135Z Packet Processor uCode:: 185 2025-12-04T11:13:55.5430310Z SDMA engine uCode:: 24 2025-12-04T11:13:55.5430478Z IOMMU Support:: None 2025-12-04T11:13:55.5430622Z Pool Info: 2025-12-04T11:13:55.5430739Z Pool 1 2025-12-04T11:13:55.5430883Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:13:55.5431043Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5431203Z Allocatable: TRUE 2025-12-04T11:13:55.5431372Z Alloc Granule: 4KB 2025-12-04T11:13:55.5431542Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5431721Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5431894Z Accessible by all: FALSE 2025-12-04T11:13:55.5432041Z Pool 2 2025-12-04T11:13:55.5432182Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:13:55.5432340Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5432496Z Allocatable: TRUE 2025-12-04T11:13:55.5432660Z Alloc Granule: 4KB 2025-12-04T11:13:55.5432830Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5432996Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5433161Z Accessible by all: FALSE 2025-12-04T11:13:55.5433305Z Pool 3 2025-12-04T11:13:55.5433442Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:13:55.5433596Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5433753Z Allocatable: TRUE 2025-12-04T11:13:55.5433916Z Alloc Granule: 4KB 2025-12-04T11:13:55.5434085Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5434252Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5434414Z Accessible by all: FALSE 2025-12-04T11:13:55.5434557Z Pool 4 2025-12-04T11:13:55.5434737Z Segment: GROUP 2025-12-04T11:13:55.5434890Z Size: 64(0x40) KB 2025-12-04T11:13:55.5435041Z Allocatable: FALSE 2025-12-04T11:13:55.5435203Z Alloc Granule: 0KB 2025-12-04T11:13:55.5435370Z Alloc Recommended Granule:0KB 2025-12-04T11:13:55.5435540Z Alloc Alignment: 0KB 2025-12-04T11:13:55.5435705Z Accessible by all: FALSE 2025-12-04T11:13:55.5435846Z ISA Info: 2025-12-04T11:13:55.5435954Z ISA 1 2025-12-04T11:13:55.5436089Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:13:55.5436258Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5436423Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5436588Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5436793Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5436950Z Fast f16: TRUE 2025-12-04T11:13:55.5437105Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5437253Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5437387Z x 1024(0x400) 2025-12-04T11:13:55.5437521Z y 1024(0x400) 2025-12-04T11:13:55.5437653Z z 1024(0x400) 2025-12-04T11:13:55.5437798Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5437941Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5438069Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5438206Z y 65535(0xffff) 2025-12-04T11:13:55.5438343Z z 65535(0xffff) 2025-12-04T11:13:55.5438492Z FBarrier Max Size: 32 2025-12-04T11:13:55.5438632Z ISA 2 2025-12-04T11:13:55.5438776Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:13:55.5438954Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5439121Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5439279Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5439442Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5439593Z Fast f16: TRUE 2025-12-04T11:13:55.5439781Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5439930Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5440064Z x 1024(0x400) 2025-12-04T11:13:55.5440193Z y 1024(0x400) 2025-12-04T11:13:55.5440324Z z 1024(0x400) 2025-12-04T11:13:55.5440467Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5440610Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5440736Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5440865Z y 65535(0xffff) 2025-12-04T11:13:55.5440995Z z 65535(0xffff) 2025-12-04T11:13:55.5441138Z FBarrier Max Size: 32 2025-12-04T11:13:55.5441273Z ******* 2025-12-04T11:13:55.5441374Z Agent 5 2025-12-04T11:13:55.5441511Z ******* 2025-12-04T11:13:55.5441630Z Name: gfx942 2025-12-04T11:13:55.5441776Z Uuid: GPU-c0ef8e6d11fbb7b6 2025-12-04T11:13:55.5441931Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.5442089Z Vendor Name: AMD 2025-12-04T11:13:55.5442240Z Feature: KERNEL_DISPATCH 2025-12-04T11:13:55.5442393Z Profile: BASE_PROFILE 2025-12-04T11:13:55.5442548Z Float Round Mode: NEAR 2025-12-04T11:13:55.5442703Z Max Queue Number: 128(0x80) 2025-12-04T11:13:55.5442861Z Queue Min Size: 64(0x40) 2025-12-04T11:13:55.5443016Z Queue Max Size: 131072(0x20000) 2025-12-04T11:13:55.5443172Z Queue Type: MULTI 2025-12-04T11:13:55.5443318Z Node: 4 2025-12-04T11:13:55.5443501Z Device Type: GPU 2025-12-04T11:13:55.5443635Z Cache Info: 2025-12-04T11:13:55.5443751Z L1: 32(0x20) KB 2025-12-04T11:13:55.5443885Z L2: 4096(0x1000) KB 2025-12-04T11:13:55.5444015Z L3: 262144(0x40000) KB 2025-12-04T11:13:55.5444150Z Chip ID: 29861(0x74a5) 2025-12-04T11:13:55.5444296Z ASIC Revision: 1(0x1) 2025-12-04T11:13:55.5444449Z Cacheline Size: 128(0x80) 2025-12-04T11:13:55.5444602Z Max Clock Freq. (MHz): 2100 2025-12-04T11:13:55.5444747Z BDFID: 25856 2025-12-04T11:13:55.5444898Z Internal Node ID: 4 2025-12-04T11:13:55.5445053Z Compute Unit: 304 2025-12-04T11:13:55.5445208Z SIMDs per CU: 4 2025-12-04T11:13:55.5445365Z Shader Engines: 32 2025-12-04T11:13:55.5445524Z Shader Arrs. per Eng.: 1 2025-12-04T11:13:55.5445684Z WatchPts on Addr. Ranges:4 2025-12-04T11:13:55.5445843Z Coherent Host Access: FALSE 2025-12-04T11:13:55.5445986Z Memory Properties: 2025-12-04T11:13:55.5446103Z Features: KERNEL_DISPATCH 2025-12-04T11:13:55.5446246Z Fast F16 Operation: TRUE 2025-12-04T11:13:55.5446401Z Wavefront Size: 64(0x40) 2025-12-04T11:13:55.5446555Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5446700Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5446828Z x 1024(0x400) 2025-12-04T11:13:55.5446958Z y 1024(0x400) 2025-12-04T11:13:55.5447083Z z 1024(0x400) 2025-12-04T11:13:55.5447227Z Max Waves Per CU: 32(0x20) 2025-12-04T11:13:55.5447383Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:13:55.5447537Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5447675Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5447792Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5447922Z y 65535(0xffff) 2025-12-04T11:13:55.5448049Z z 65535(0xffff) 2025-12-04T11:13:55.5448284Z Max fbarriers/Workgrp: 32 2025-12-04T11:13:55.5448454Z Packet Processor uCode:: 185 2025-12-04T11:13:55.5448614Z SDMA engine uCode:: 24 2025-12-04T11:13:55.5448770Z IOMMU Support:: None 2025-12-04T11:13:55.5448905Z Pool Info: 2025-12-04T11:13:55.5449011Z Pool 1 2025-12-04T11:13:55.5449142Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:13:55.5449295Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5449444Z Allocatable: TRUE 2025-12-04T11:13:55.5449600Z Alloc Granule: 4KB 2025-12-04T11:13:55.5449807Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5449969Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5450131Z Accessible by all: FALSE 2025-12-04T11:13:55.5450303Z Pool 2 2025-12-04T11:13:55.5450433Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:13:55.5450583Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5450731Z Allocatable: TRUE 2025-12-04T11:13:55.5450886Z Alloc Granule: 4KB 2025-12-04T11:13:55.5451047Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5451210Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5451369Z Accessible by all: FALSE 2025-12-04T11:13:55.5451506Z Pool 3 2025-12-04T11:13:55.5451633Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:13:55.5451782Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5451934Z Allocatable: TRUE 2025-12-04T11:13:55.5452089Z Alloc Granule: 4KB 2025-12-04T11:13:55.5452250Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5452412Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5452569Z Accessible by all: FALSE 2025-12-04T11:13:55.5452707Z Pool 4 2025-12-04T11:13:55.5452831Z Segment: GROUP 2025-12-04T11:13:55.5452973Z Size: 64(0x40) KB 2025-12-04T11:13:55.5453118Z Allocatable: FALSE 2025-12-04T11:13:55.5453273Z Alloc Granule: 0KB 2025-12-04T11:13:55.5453434Z Alloc Recommended Granule:0KB 2025-12-04T11:13:55.5453597Z Alloc Alignment: 0KB 2025-12-04T11:13:55.5453754Z Accessible by all: FALSE 2025-12-04T11:13:55.5453891Z ISA Info: 2025-12-04T11:13:55.5453996Z ISA 1 2025-12-04T11:13:55.5454127Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:13:55.5454291Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5454450Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5454610Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5454773Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5454926Z Fast f16: TRUE 2025-12-04T11:13:55.5455078Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5455261Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5455393Z x 1024(0x400) 2025-12-04T11:13:55.5455524Z y 1024(0x400) 2025-12-04T11:13:55.5455652Z z 1024(0x400) 2025-12-04T11:13:55.5455793Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5455932Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5456055Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5456186Z y 65535(0xffff) 2025-12-04T11:13:55.5456314Z z 65535(0xffff) 2025-12-04T11:13:55.5456458Z FBarrier Max Size: 32 2025-12-04T11:13:55.5456592Z ISA 2 2025-12-04T11:13:55.5456735Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:13:55.5456938Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5457099Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5457259Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5457421Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5457573Z Fast f16: TRUE 2025-12-04T11:13:55.5457723Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5457865Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5457991Z x 1024(0x400) 2025-12-04T11:13:55.5458119Z y 1024(0x400) 2025-12-04T11:13:55.5458246Z z 1024(0x400) 2025-12-04T11:13:55.5458388Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5458530Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5458650Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5458780Z y 65535(0xffff) 2025-12-04T11:13:55.5458909Z z 65535(0xffff) 2025-12-04T11:13:55.5459053Z FBarrier Max Size: 32 2025-12-04T11:13:55.5459188Z ******* 2025-12-04T11:13:55.5459288Z Agent 6 2025-12-04T11:13:55.5459386Z ******* 2025-12-04T11:13:55.5459501Z Name: gfx942 2025-12-04T11:13:55.5459645Z Uuid: GPU-10f755404c07bc49 2025-12-04T11:13:55.5459836Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.5459997Z Vendor Name: AMD 2025-12-04T11:13:55.5460153Z Feature: KERNEL_DISPATCH 2025-12-04T11:13:55.5460306Z Profile: BASE_PROFILE 2025-12-04T11:13:55.5460459Z Float Round Mode: NEAR 2025-12-04T11:13:55.5460613Z Max Queue Number: 128(0x80) 2025-12-04T11:13:55.5460764Z Queue Min Size: 64(0x40) 2025-12-04T11:13:55.5460912Z Queue Max Size: 131072(0x20000) 2025-12-04T11:13:55.5461060Z Queue Type: MULTI 2025-12-04T11:13:55.5461202Z Node: 5 2025-12-04T11:13:55.5461346Z Device Type: GPU 2025-12-04T11:13:55.5461479Z Cache Info: 2025-12-04T11:13:55.5461595Z L1: 32(0x20) KB 2025-12-04T11:13:55.5461772Z L2: 4096(0x1000) KB 2025-12-04T11:13:55.5461904Z L3: 262144(0x40000) KB 2025-12-04T11:13:55.5462039Z Chip ID: 29861(0x74a5) 2025-12-04T11:13:55.5462186Z ASIC Revision: 1(0x1) 2025-12-04T11:13:55.5462338Z Cacheline Size: 128(0x80) 2025-12-04T11:13:55.5462491Z Max Clock Freq. (MHz): 2100 2025-12-04T11:13:55.5462637Z BDFID: 5376 2025-12-04T11:13:55.5462784Z Internal Node ID: 5 2025-12-04T11:13:55.5462936Z Compute Unit: 304 2025-12-04T11:13:55.5463084Z SIMDs per CU: 4 2025-12-04T11:13:55.5463236Z Shader Engines: 32 2025-12-04T11:13:55.5463395Z Shader Arrs. per Eng.: 1 2025-12-04T11:13:55.5463592Z WatchPts on Addr. Ranges:4 2025-12-04T11:13:55.5463753Z Coherent Host Access: FALSE 2025-12-04T11:13:55.5463894Z Memory Properties: 2025-12-04T11:13:55.5464012Z Features: KERNEL_DISPATCH 2025-12-04T11:13:55.5464158Z Fast F16 Operation: TRUE 2025-12-04T11:13:55.5464315Z Wavefront Size: 64(0x40) 2025-12-04T11:13:55.5464474Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5464618Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5464744Z x 1024(0x400) 2025-12-04T11:13:55.5464873Z y 1024(0x400) 2025-12-04T11:13:55.5465001Z z 1024(0x400) 2025-12-04T11:13:55.5465143Z Max Waves Per CU: 32(0x20) 2025-12-04T11:13:55.5465302Z Max Work-item Per CU: 2048(0x800) 2025-12-04T11:13:55.5465456Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5465596Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5465718Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5465851Z y 65535(0xffff) 2025-12-04T11:13:55.5465981Z z 65535(0xffff) 2025-12-04T11:13:55.5466127Z Max fbarriers/Workgrp: 32 2025-12-04T11:13:55.5466295Z Packet Processor uCode:: 185 2025-12-04T11:13:55.5466456Z SDMA engine uCode:: 24 2025-12-04T11:13:55.5466613Z IOMMU Support:: None 2025-12-04T11:13:55.5466750Z Pool Info: 2025-12-04T11:13:55.5466860Z Pool 1 2025-12-04T11:13:55.5466999Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T11:13:55.5467152Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5467303Z Allocatable: TRUE 2025-12-04T11:13:55.5467461Z Alloc Granule: 4KB 2025-12-04T11:13:55.5467626Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5467790Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5467950Z Accessible by all: FALSE 2025-12-04T11:13:55.5468089Z Pool 2 2025-12-04T11:13:55.5468221Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T11:13:55.5468373Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5468550Z Allocatable: TRUE 2025-12-04T11:13:55.5468708Z Alloc Granule: 4KB 2025-12-04T11:13:55.5468871Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5469035Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5469194Z Accessible by all: FALSE 2025-12-04T11:13:55.5469333Z Pool 3 2025-12-04T11:13:55.5469462Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T11:13:55.5469640Z Size: 268419072(0xfffc000) KB 2025-12-04T11:13:55.5469830Z Allocatable: TRUE 2025-12-04T11:13:55.5469997Z Alloc Granule: 4KB 2025-12-04T11:13:55.5470163Z Alloc Recommended Granule:2048KB 2025-12-04T11:13:55.5470331Z Alloc Alignment: 4KB 2025-12-04T11:13:55.5470536Z Accessible by all: FALSE 2025-12-04T11:13:55.5470675Z Pool 4 2025-12-04T11:13:55.5470800Z Segment: GROUP 2025-12-04T11:13:55.5470945Z Size: 64(0x40) KB 2025-12-04T11:13:55.5471093Z Allocatable: FALSE 2025-12-04T11:13:55.5471248Z Alloc Granule: 0KB 2025-12-04T11:13:55.5471411Z Alloc Recommended Granule:0KB 2025-12-04T11:13:55.5471574Z Alloc Alignment: 0KB 2025-12-04T11:13:55.5471733Z Accessible by all: FALSE 2025-12-04T11:13:55.5471871Z ISA Info: 2025-12-04T11:13:55.5471976Z ISA 1 2025-12-04T11:13:55.5472111Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T11:13:55.5472280Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5472441Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5472601Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5472764Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5472917Z Fast f16: TRUE 2025-12-04T11:13:55.5473070Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5473215Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5473346Z x 1024(0x400) 2025-12-04T11:13:55.5473479Z y 1024(0x400) 2025-12-04T11:13:55.5473609Z z 1024(0x400) 2025-12-04T11:13:55.5473752Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5473895Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5474015Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5474147Z y 65535(0xffff) 2025-12-04T11:13:55.5474278Z z 65535(0xffff) 2025-12-04T11:13:55.5474423Z FBarrier Max Size: 32 2025-12-04T11:13:55.5474560Z ISA 2 2025-12-04T11:13:55.5474701Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T11:13:55.5474877Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T11:13:55.5475039Z Profiles: HSA_PROFILE_BASE 2025-12-04T11:13:55.5475200Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5475404Z Default Rounding Mode: NEAR 2025-12-04T11:13:55.5475559Z Fast f16: TRUE 2025-12-04T11:13:55.5475711Z Workgroup Max Size: 1024(0x400) 2025-12-04T11:13:55.5475856Z Workgroup Max Size per Dimension: 2025-12-04T11:13:55.5475983Z x 1024(0x400) 2025-12-04T11:13:55.5476115Z y 1024(0x400) 2025-12-04T11:13:55.5476243Z z 1024(0x400) 2025-12-04T11:13:55.5476384Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T11:13:55.5476523Z Grid Max Size per Dimension: 2025-12-04T11:13:55.5476646Z x 2147483647(0x7fffffff) 2025-12-04T11:13:55.5476777Z y 65535(0xffff) 2025-12-04T11:13:55.5476910Z z 65535(0xffff) 2025-12-04T11:13:55.5477084Z FBarrier Max Size: 32 2025-12-04T11:13:55.5477220Z *** Done *** 2025-12-04T11:13:55.5477334Z + rocminfo 2025-12-04T11:13:55.5477434Z + grep -E 'Name:.*\sgfx|Marketing' 2025-12-04T11:13:55.6408039Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:13:55.6408234Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T11:13:55.6408406Z Name: gfx942 2025-12-04T11:13:55.6408561Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.6408712Z Name: gfx942 2025-12-04T11:13:55.6408863Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.6409013Z Name: gfx942 2025-12-04T11:13:55.6409168Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.6409318Z Name: gfx942 2025-12-04T11:13:55.6409472Z Marketing Name: AMD Radeon Graphics 2025-12-04T11:13:55.6498156Z + MAYBE_ROCM=rocm/ 2025-12-04T11:13:55.6498328Z + [[ linux-noble-rocm-py3.12-mi300 == *xpu* ]] 2025-12-04T11:13:55.6498867Z + [[ linux-noble-rocm-py3.12-mi300 != *-bazel-* ]] 2025-12-04T11:13:55.6499215Z + pip_install ninja==1.10.2 2025-12-04T11:13:55.6499527Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T11:13:55.6499937Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T11:13:55.8339970Z Collecting ninja==1.10.2 2025-12-04T11:13:55.8592327Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T11:13:55.8670722Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T11:13:55.9601605Z Installing collected packages: ninja 2025-12-04T11:13:55.9602038Z Attempting uninstall: ninja 2025-12-04T11:13:55.9614097Z Found existing installation: ninja 1.11.1.4 2025-12-04T11:13:55.9623930Z Uninstalling ninja-1.11.1.4: 2025-12-04T11:13:55.9650021Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T11:13:55.9732346Z Successfully installed ninja-1.10.2 2025-12-04T11:13:56.0049036Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T11:13:56.0051035Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T11:13:56.0052184Z + [[ linux-noble-rocm-py3.12-mi300 == *aarch64* ]] 2025-12-04T11:13:56.0052911Z + [[ linux-noble-rocm-py3.12-mi300 == *asan* ]] 2025-12-04T11:13:56.0053312Z + [[ linux-noble-rocm-py3.12-mi300 == *-debug* ]] 2025-12-04T11:13:56.0053731Z + [[ linux-noble-rocm-py3.12-mi300 != *-bazel-* ]] 2025-12-04T11:13:56.0054305Z + echo 'We are not in debug mode: linux-noble-rocm-py3.12-mi300. Expect the assertion to pass' 2025-12-04T11:13:56.0054983Z We are not in debug mode: linux-noble-rocm-py3.12-mi300. Expect the assertion to pass 2025-12-04T11:13:56.0055477Z + cd test 2025-12-04T11:13:56.0055857Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T11:13:56.8657620Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T11:13:56.8658121Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T11:13:56.8658545Z + [[ distributed == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T11:13:56.8659910Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T11:13:56.8660274Z + [[ distributed == *pr_time_benchmarks* ]] 2025-12-04T11:13:56.8660634Z + [[ distributed == *dynamo_eager* ]] 2025-12-04T11:13:56.8661017Z + [[ distributed == *aot_eager* ]] 2025-12-04T11:13:56.8661961Z + [[ distributed == *aot_inductor* ]] 2025-12-04T11:13:56.8662314Z + [[ distributed == *max_autotune_inductor* ]] 2025-12-04T11:13:56.8662672Z + [[ distributed == *inductor* ]] 2025-12-04T11:13:56.8663006Z + [[ distributed == *dynamic* ]] 2025-12-04T11:13:56.8663328Z + [[ distributed == *cpu* ]] 2025-12-04T11:13:56.8663632Z + [[ distributed == *xpu* ]] 2025-12-04T11:13:56.8663976Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T11:13:56.8675655Z + [[ linux-noble-rocm-py3.12-mi300 == *libtorch* ]] 2025-12-04T11:13:56.8675913Z + [[ linux-noble-rocm-py3.12-mi300 == *-bazel-* ]] 2025-12-04T11:13:56.8678898Z + cd test 2025-12-04T11:13:56.8679593Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T11:13:57.6504115Z PyTorch built with: 2025-12-04T11:13:57.6504307Z - GCC 11.5 2025-12-04T11:13:57.6504454Z - C++ Version: 201703 2025-12-04T11:13:57.6504803Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T11:13:57.6505215Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T11:13:57.6505469Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T11:13:57.6505676Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T11:13:57.6505866Z - NNPACK is enabled 2025-12-04T11:13:57.6506029Z - CPU capability usage: AVX512 2025-12-04T11:13:57.6506224Z - HIP Runtime 7.1.25424 2025-12-04T11:13:57.6506378Z - MIOpen 3.5.1 2025-12-04T11:13:57.6506522Z - Magma 2.9.0 2025-12-04T11:13:57.6508972Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_FBGEMM_GENAI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T11:13:57.6511466Z 2025-12-04T11:13:57.9086390Z + cd test 2025-12-04T11:13:57.9086703Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T11:13:58.5829527Z ATen/Parallel: 2025-12-04T11:13:58.5830173Z at::get_num_threads() : 128 2025-12-04T11:13:58.5831214Z at::get_num_interop_threads() : 128 2025-12-04T11:13:58.5831583Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T11:13:58.5831940Z omp_get_max_threads() : 128 2025-12-04T11:13:58.5832564Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T11:13:58.5833177Z mkl_get_max_threads() : 128 2025-12-04T11:13:58.5833615Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T11:13:58.5834089Z std::thread::hardware_concurrency() : 128 2025-12-04T11:13:58.5834439Z Environment variables: 2025-12-04T11:13:58.5834734Z OMP_NUM_THREADS : [not set] 2025-12-04T11:13:58.5835028Z MKL_NUM_THREADS : [not set] 2025-12-04T11:13:58.5835335Z ATen parallel backend: OpenMP 2025-12-04T11:13:58.5835545Z 2025-12-04T11:13:58.7860906Z + [[ distributed == *numpy_2* ]] 2025-12-04T11:13:58.7861090Z + [[ linux-noble-rocm-py3.12-mi300 == *aarch64* ]] 2025-12-04T11:13:58.7861250Z + [[ distributed == *backward* ]] 2025-12-04T11:13:58.7861408Z + [[ distributed == *libtorch_agnostic_targetting* ]] 2025-12-04T11:13:58.7861759Z + [[ distributed == *xla* ]] 2025-12-04T11:13:58.7861881Z + [[ distributed == *vllm* ]] 2025-12-04T11:13:58.7862003Z + [[ distributed == *executorch* ]] 2025-12-04T11:13:58.7862138Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T11:13:58.7862278Z + [[ distributed == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T11:13:58.7862431Z + [[ linux-noble-rocm-py3.12-mi300 == *libtorch* ]] 2025-12-04T11:13:58.7862582Z + [[ distributed == distributed ]] 2025-12-04T11:13:58.7862708Z + test_distributed 2025-12-04T11:13:58.7862829Z + echo 'Testing distributed python tests' 2025-12-04T11:13:58.7862967Z Testing distributed python tests 2025-12-04T11:13:58.7863142Z + python test/run_test.py --distributed-tests --shard 1 3 --verbose 2025-12-04T11:14:00.3493263Z Excluding distributed/rpc/test_faulty_agent on ROCm 2025-12-04T11:14:00.3493669Z Excluding distributed/rpc/test_tensorpipe_agent on ROCm 2025-12-04T11:14:00.3494037Z Excluding distributed/rpc/test_share_memory on ROCm 2025-12-04T11:14:00.3494400Z Excluding distributed/rpc/cuda/test_tensorpipe_agent on ROCm 2025-12-04T11:14:01.2128921Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-12-04T11:14:01.5560185Z Ignoring disabled issues: [''] 2025-12-04T11:14:01.5611563Z Found test times from artifacts 2025-12-04T11:14:01.5785009Z Found test times from artifacts 2025-12-04T11:14:01.5789851Z Running all tests 2025-12-04T11:14:01.5836247Z Running parallel tests on 1 processes 2025-12-04T11:14:01.5836829Z Name: tests to run (est. time: 120.59min) 2025-12-04T11:14:01.5837054Z Serial tests (80): 2025-12-04T11:14:01.5838379Z distributed/test_inductor_collectives 1/1 2025-12-04T11:14:01.5838686Z distributed/pipelining/test_schedule_multiproc 1/1 2025-12-04T11:14:01.5838900Z distributed/checkpoint/_experimental/test_barriers 1/1 2025-12-04T11:14:01.5839104Z distributed/pipelining/test_transformer 1/1 2025-12-04T11:14:01.5839346Z distributed/flight_recorder/test_fr_analysis 1/1 2025-12-04T11:14:01.5839542Z distributed/_composable/test_contract 1/1 2025-12-04T11:14:01.5839770Z distributed/checkpoint/test_dedup_tensors 1/1 2025-12-04T11:14:01.5839943Z distributed/test_c10d_functional_native 1/1 2025-12-04T11:14:01.5840109Z distributed/test_nvshmem_triton 1/1 2025-12-04T11:14:01.5840263Z distributed/test_cupy_as_tensor 1/1 2025-12-04T11:14:01.5840415Z distributed/fsdp/test_fsdp_fx 1/1 2025-12-04T11:14:01.5840568Z distributed/_tools/test_sac_ilp 1/1 2025-12-04T11:14:01.5840728Z distributed/checkpoint/test_hf_storage 1/1 2025-12-04T11:14:01.5840895Z distributed/pipelining/test_microbatch 1/1 2025-12-04T11:14:01.5841064Z distributed/tensor/test_placement_types 1/1 2025-12-04T11:14:01.5841247Z distributed/tensor/test_dtensor_dispatch_overhead 1/1 2025-12-04T11:14:01.5841463Z distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 2025-12-04T11:14:01.5842163Z distributed/checkpoint/test_format_utils 1/1 2025-12-04T11:14:01.5842349Z distributed/test_aten_comm_compute_reordering 1/1 2025-12-04T11:14:01.5842541Z distributed/checkpoint/test_quantized_hf_storage 1/1 2025-12-04T11:14:01.5842761Z distributed/_composable/test_composability/test_pp_composability 1/1 2025-12-04T11:14:01.5842963Z distributed/test_device_mesh 1/1 2025-12-04T11:14:01.5843126Z distributed/tensor/parallel/test_tp_style 1/1 2025-12-04T11:14:01.5843295Z distributed/checkpoint/test_fsspec 1/1 2025-12-04T11:14:01.5843475Z distributed/tensor/experimental/test_tp_transform 1/1 2025-12-04T11:14:01.5843681Z distributed/_composable/test_replicate_mixed_precision 1/1 2025-12-04T11:14:01.5843896Z distributed/_composable/fsdp/test_fully_shard_logging 1/1 2025-12-04T11:14:01.5844110Z distributed/_composable/fsdp/test_fully_shard_ignore_params 1/1 2025-12-04T11:14:01.5844307Z distributed/tensor/test_embedding_ops 1/1 2025-12-04T11:14:01.5844486Z distributed/checkpoint/test_fsdp_optim_state 1/1 2025-12-04T11:14:01.5844821Z distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 2025-12-04T11:14:01.5845004Z distributed/_tools/test_runtime_estimator 1/1 2025-12-04T11:14:01.5845171Z distributed/fsdp/test_fsdp_memory 1/1 2025-12-04T11:14:01.5845333Z distributed/tensor/test_pointwise_ops 1/1 2025-12-04T11:14:01.5845500Z distributed/checkpoint/test_compatibility 1/1 2025-12-04T11:14:01.5845669Z distributed/_tools/test_mem_tracker 1/1 2025-12-04T11:14:01.5845830Z distributed/elastic/test_control_plane 1/1 2025-12-04T11:14:01.5845992Z distributed/fsdp/test_fsdp_overlap 1/1 2025-12-04T11:14:01.5846185Z distributed/test_fake_pg 1/1 2025-12-04T11:14:01.5846347Z distributed/checkpoint/test_fsdp_model_state 1/1 2025-12-04T11:14:01.5846515Z distributed/fsdp/test_utils 1/1 2025-12-04T11:14:01.5846680Z distributed/tensor/parallel/test_tp_examples 1/1 2025-12-04T11:14:01.5846896Z distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 2025-12-04T11:14:01.5847095Z distributed/tensor/debug/test_comm_mode 1/1 2025-12-04T11:14:01.5847261Z distributed/test_dist2 1/1 2025-12-04T11:14:01.5847440Z distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 2025-12-04T11:14:01.5847630Z distributed/launcher/test_run 1/1 2025-12-04T11:14:01.5847801Z distributed/fsdp/test_fsdp_backward_prefetch 1/1 2025-12-04T11:14:01.5847977Z distributed/fsdp/test_fsdp_pure_fp16 1/1 2025-12-04T11:14:01.5848141Z distributed/checkpoint/test_checkpoint 1/1 2025-12-04T11:14:01.5848302Z distributed/_pycute/test_coalesce 1/1 2025-12-04T11:14:01.5848464Z distributed/_pycute/test_complement 1/1 2025-12-04T11:14:01.5848629Z distributed/_pycute/test_composition 1/1 2025-12-04T11:14:01.5848787Z distributed/_pycute/test_int_tuple 1/1 2025-12-04T11:14:01.5848947Z distributed/_pycute/test_left_inverse 1/1 2025-12-04T11:14:01.5849109Z distributed/_pycute/test_right_inverse 1/1 2025-12-04T11:14:01.5849279Z distributed/tensor/debug/test_debug_mode 1/1 2025-12-04T11:14:01.5849448Z distributed/fsdp/test_fsdp_apply 1/1 2025-12-04T11:14:01.5849626Z distributed/_composable/fsdp/test_fully_shard_frozen 1/1 2025-12-04T11:14:01.5849871Z distributed/checkpoint/test_hsdp_checkpoint 1/1 2025-12-04T11:14:01.5850062Z distributed/tensor/parallel/test_parallelize_api 1/1 2025-12-04T11:14:01.5850237Z distributed/tensor/test_view_ops 1/1 2025-12-04T11:14:01.5850396Z distributed/fsdp/test_fsdp_state_dict 1/2 2025-12-04T11:14:01.5850553Z distributed/_pycute/test_typing 1/1 2025-12-04T11:14:01.5850703Z distributed/test_distributed_spawn 1/7 2025-12-04T11:14:01.5850862Z distributed/test_distributed_spawn 4/7 2025-12-04T11:14:01.5851017Z distributed/test_distributed_spawn 7/7 2025-12-04T11:14:01.5851186Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 2025-12-04T11:14:01.5851382Z distributed/_composable/fsdp/test_fully_shard_training 1/1 2025-12-04T11:14:01.5851625Z distributed/_shard/sharded_tensor/ops/test_binary_cmp 1/1 2025-12-04T11:14:01.5851784Z distributed/test_nccl 1/1 2025-12-04T11:14:01.5851908Z distributed/fsdp/test_fsdp_meta 1/1 2025-12-04T11:14:01.5852041Z distributed/test_data_parallel 1/1 2025-12-04T11:14:01.5852179Z distributed/checkpoint/test_state_dict 1/1 2025-12-04T11:14:01.5852317Z distributed/fsdp/test_fsdp_core 3/3 2025-12-04T11:14:01.5852447Z distributed/test_c10d_ucc 1/1 2025-12-04T11:14:01.5852588Z distributed/fsdp/test_fsdp_use_orig_params 1/1 2025-12-04T11:14:01.5852733Z distributed/test_c10d_common 1/1 2025-12-04T11:14:01.5852888Z distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 2025-12-04T11:14:01.5853048Z distributed/test_c10d_nccl 3/3 2025-12-04T11:14:01.5853174Z Parallel tests (0): 2025-12-04T11:14:01.5853295Z Name: excluded (est. time: 0.0min) 2025-12-04T11:14:01.5853417Z Serial tests (0): 2025-12-04T11:14:01.5853522Z Parallel tests (0): 2025-12-04T11:14:01.5853719Z Running distributed/test_inductor_collectives 1/1 ... [2025-12-04 11:14:01.583963][2231984.058989948] 2025-12-04T11:14:01.5853978Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:14:01.5854404Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_inductor_collectives.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:01.584153] 2025-12-04T11:18:29.6634460Z 2025-12-04T11:18:29.6635630Z distributed/test_inductor_collectives 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_inductor_collectives_1.1_fdf824b32333acbf_.log 2025-12-04T11:18:29.6656257Z Running 69 items in this shard: test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_all_to_all_recompute_is_always_banned_override_with_ac_False, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_all_to_all_recompute_is_always_banned_override_with_ac_True, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_all_to_all_single_inductor, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_all_to_all_single_inductor_split_sizes_none, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allgather_contiguous_input, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allgather_into_tensor_inductor, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allgather_output_buffer_reuse, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allgather_scalar_tensor_input, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allreduce_inductor, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allreduce_inductor_cudagraph_trees, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allreduce_input_buffer_reuse, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_broadcast_inductor, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_c10d_functional_tagged_pt2_compliant, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_eager_allreduce_inductor_wait, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_eager_async_allreduce_inductor_wait, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_inductor_allreduce_eager_wait, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_permute_tensor, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_reduce_scatter_tensor_inductor, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_bucket_mode_all, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_bucket_mode_all_custom_ops, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_multidtype_bucket_mode_all_custom_ops_multidtype, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_path, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_reduce_bucket_bucket_mode_all, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_backwards, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_get_world_group_source_GroupMember_WORLD, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_get_world_group_source__get_default_group, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_get_world_group_source_group_WORLD, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_graphbreaks_unsupported_async_op, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_pg_var, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_all_gather, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_all_gather_args_match, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_all_gather_list, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_all_to_all_single, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_pg_mode_kwargs, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_pg_mode_kwargs_none, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_pg_mode_positional, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_pg_mode_positional_none, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_pg_mode_unspecified, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_reduce_op_reduce_op0, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_reduce_op_reduce_op1, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_reduce_op_reduce_op2, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_reduce_op_reduce_op3, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_reduce_op_reduce_op4, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_reduce_scatter, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_support_collective_op_with_async_op_False, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_trace_all_gather_tensor, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_trace_all_gather_tensor_pg, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_trace_allgather_coalesced, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_trace_allreduce, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_trace_reduce_scatter_tensor, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_all_gather_coalesced, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_doesnt_mutate_shared, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_doesnt_mutate_shared_graph_partition, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_reduce_scatter_coalesced, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_single_op, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_steal_buffer, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_meta, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reduce_scatter_bucket_bucket_mode_all, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reduce_scatter_bucket_bucket_mode_all_custom_ops, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reorder_peak_memory, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reorder_peak_memory_bucketed_bucket_mode_all, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reorder_peak_memory_bucketed_bucket_mode_all_custom_ops, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reorder_respects_wait_dep, test/distributed/test_inductor_collectives.py::TestSyncDecisionCrossRanks::test_all_gather_comm_analysis, test/distributed/test_inductor_collectives.py::TestSyncDecisionCrossRanks::test_all_reduce_comm_analysis, test/distributed/test_inductor_collectives.py::TestSyncDecisionCrossRanks::test_all_to_all_comm_analysis, test/distributed/test_inductor_collectives.py::TestSyncDecisionCrossRanks::test_reduce_scatter_comm_analysis, test/distributed/test_inductor_collectives.py::TestSyncDecisionCrossRanks::test_regression_use_nccl_estimate_with_gloo, test/distributed/test_inductor_collectives.py::TestSyncDecisionCrossRanks::test_sync_decision_cross_ranks 2025-12-04T11:18:29.6668939Z 2025-12-04T11:18:29.6669085Z Finished distributed/test_inductor_collectives 1/1 ... [2025-12-04 11:18:29.663509][2232252.138533323], took 4.47min 2025-12-04T11:18:29.6669542Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:18:31.7093007Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:18:31.7093601Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T11:18:31.7094006Z Uploading artifacts took 0.00 seconds 2025-12-04T11:18:31.7094471Z Running distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 11:18:31.709068][2232254.184088901] 2025-12-04T11:18:31.7094944Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:18:31.7095846Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/pipelining/test_schedule_multiproc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:18:31.709357] 2025-12-04T11:18:52.8144711Z 2025-12-04T11:18:52.8145549Z distributed/pipelining/test_schedule_multiproc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_schedule_multiproc_1.1_d96f60b79740f37a_.log 2025-12-04T11:18:52.8153588Z Running 34 items in this shard: test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_custom_function_callback, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_forward_only_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_schedule_with_weight_update_mlp_e2e_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_pipeline_schedule_runtime_custom_sched_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_schedule_with_native_zero_bubble_ScheduleClass0 2025-12-04T11:18:52.8159905Z 2025-12-04T11:18:52.8160080Z Finished distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 11:18:52.814200][2232275.289225723], took 0.35min 2025-12-04T11:18:52.8160576Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:18:52.8163974Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:18:52.8167294Z Running distributed/checkpoint/_experimental/test_barriers 1/1 ... [2025-12-04 11:18:52.816647][2232275.291677021] 2025-12-04T11:18:52.8167531Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:18:52.8169315Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_barriers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:18:52.816841] 2025-12-04T11:18:54.9846180Z 2025-12-04T11:18:54.9847582Z distributed/checkpoint/_experimental/test_barriers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_barriers_1.1_4a0b5cb5c986554e_.log 2025-12-04T11:18:54.9849006Z Running 2 items in this shard: test/distributed/checkpoint/_experimental/test_barriers.py::TestBarriers::test_execute_barrier, test/distributed/checkpoint/_experimental/test_barriers.py::TestBarriers::test_tcpstore_barrier_initialization 2025-12-04T11:18:54.9849941Z 2025-12-04T11:18:54.9850278Z Finished distributed/checkpoint/_experimental/test_barriers 1/1 ... [2025-12-04 11:18:54.984262][2232277.459286692], took 0.04min 2025-12-04T11:18:54.9851242Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:18:54.9869582Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:18:54.9873042Z Running distributed/pipelining/test_transformer 1/1 ... [2025-12-04 11:18:54.987205][2232277.462235018] 2025-12-04T11:18:54.9873416Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:18:54.9875223Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/pipelining/test_transformer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:18:54.987405] 2025-12-04T11:18:59.4586907Z 2025-12-04T11:18:59.4588182Z distributed/pipelining/test_transformer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_transformer_1.1_1976e25289ed9d03_.log 2025-12-04T11:18:59.4589109Z Running 1 items in this shard: test/distributed/pipelining/test_transformer.py::TransformerTestsCUDA::test_ir_cuda 2025-12-04T11:18:59.4589467Z 2025-12-04T11:18:59.4589896Z Finished distributed/pipelining/test_transformer 1/1 ... [2025-12-04 11:18:59.458363][2232281.933388438], took 0.07min 2025-12-04T11:18:59.4591367Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:18:59.4611466Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:18:59.4614653Z Running distributed/flight_recorder/test_fr_analysis 1/1 ... [2025-12-04 11:18:59.461335][2232281.936364534] 2025-12-04T11:18:59.4615010Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:18:59.4616946Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/flight_recorder/test_fr_analysis.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:18:59.461575] 2025-12-04T11:19:01.6296902Z 2025-12-04T11:19:01.6297859Z distributed/flight_recorder/test_fr_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.flight_recorder.test_fr_analysis_1.1_544492ff20e3fd10_.log 2025-12-04T11:19:01.6299610Z Running 4 items in this shard: test/distributed/flight_recorder/test_fr_analysis.py::FlightRecorderEventTest::test_all_events, test/distributed/flight_recorder/test_fr_analysis.py::FlightRecorderEventTest::test_match_one_event, test/distributed/flight_recorder/test_fr_analysis.py::FlightMatchInfoTest::test_match_info, test/distributed/flight_recorder/test_fr_analysis.py::FlightRecorderE2ETest::testBuildDB 2025-12-04T11:19:01.6300900Z 2025-12-04T11:19:01.6301192Z Finished distributed/flight_recorder/test_fr_analysis 1/1 ... [2025-12-04 11:19:01.629420][2232284.104444328], took 0.04min 2025-12-04T11:19:01.6303201Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:19:01.6322448Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:19:01.6328708Z Running distributed/_composable/test_contract 1/1 ... [2025-12-04 11:19:01.632490][2232284.107519571] 2025-12-04T11:19:01.6329048Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:19:01.6329664Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/test_contract.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:19:01.632685] 2025-12-04T11:19:03.7510370Z 2025-12-04T11:19:03.7511534Z distributed/_composable/test_contract 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_contract_1.1_a975d5d809c1a75b_.log 2025-12-04T11:19:03.7514015Z Running 5 items in this shard: test/distributed/_composable/test_contract.py::TestContract::test_add_hooks, test/distributed/_composable/test_contract.py::TestContract::test_modify_fqn, test/distributed/_composable/test_contract.py::TestContract::test_multi_module_api, test/distributed/_composable/test_contract.py::TestContract::test_registry, test/distributed/_composable/test_contract.py::TestContract::test_state 2025-12-04T11:19:03.7516440Z 2025-12-04T11:19:03.7516838Z Finished distributed/_composable/test_contract 1/1 ... [2025-12-04 11:19:03.750691][2232286.225715699], took 0.04min 2025-12-04T11:19:03.7518099Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:19:03.7535427Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:19:03.7538312Z Running distributed/checkpoint/test_dedup_tensors 1/1 ... [2025-12-04 11:19:03.753732][2232286.228761873] 2025-12-04T11:19:03.7538663Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:19:03.7540547Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_dedup_tensors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:19:03.753928] 2025-12-04T11:19:05.8721427Z 2025-12-04T11:19:05.8722539Z distributed/checkpoint/test_dedup_tensors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_dedup_tensors_1.1_e12456f53e2bc755_.log 2025-12-04T11:19:05.8723728Z Running 1 items in this shard: test/distributed/checkpoint/test_dedup_tensors.py::TestDedupTensor::test_dedup_shards 2025-12-04T11:19:05.8724177Z 2025-12-04T11:19:05.8724511Z Finished distributed/checkpoint/test_dedup_tensors 1/1 ... [2025-12-04 11:19:05.871848][2232288.346873484], took 0.04min 2025-12-04T11:19:05.8727872Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:19:05.8747355Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:19:05.8750700Z Running distributed/test_c10d_functional_native 1/1 ... [2025-12-04 11:19:05.874952][2232288.349981637] 2025-12-04T11:19:05.8751086Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:19:05.8752754Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_c10d_functional_native.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:19:05.875149] 2025-12-04T11:22:23.0315428Z 2025-12-04T11:22:23.0318846Z distributed/test_c10d_functional_native 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_functional_native_1.1_f96ab86a53d5ae6b_.log 2025-12-04T11:22:23.0323861Z Running 33 items in this shard: test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_gather_into_tensor_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_gather_into_tensor_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_coalesced_, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_single_, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_to_all_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_broadcast, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_fixed_striding, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_functional_collectives_inference_mode, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_inductor_dtypeview_memory_leak, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_out, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_threading, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_unwaited, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_wait_tensor, test/distributed/test_c10d_functional_native.py::PyWorkTest::test_collectives, test/distributed/test_c10d_functional_native.py::PyWorkTest::test_wait_tensor, test/distributed/test_c10d_functional_native.py::CompileTestCPU::test_inductor_all_reduce_cpu, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_non_contig_input, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_to_all_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_broadcast, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_inplace_op_on_view, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reduce_scatter_tensor_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reuse_buffer_after_inplace_collective, test/distributed/test_c10d_functional_native.py::CompileTest::test_ranks_and_tag, test/distributed/test_c10d_functional_native.py::CompileTest::test_wait_tensor 2025-12-04T11:22:23.0328223Z 2025-12-04T11:22:23.0328364Z Finished distributed/test_c10d_functional_native 1/1 ... [2025-12-04 11:22:23.031205][2232485.506232788], took 3.29min 2025-12-04T11:22:23.0328814Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:23.0331824Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:23.0334252Z Running distributed/test_nvshmem_triton 1/1 ... [2025-12-04 11:22:23.033306][2232485.508336253] 2025-12-04T11:22:23.0334723Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:23.0335993Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_nvshmem_triton.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:23.033473] 2025-12-04T11:22:28.3314339Z 2025-12-04T11:22:28.3315305Z distributed/test_nvshmem_triton 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_nvshmem_triton_1.1_10c9f9e077593768_.log 2025-12-04T11:22:28.3315596Z 2025-12-04T11:22:28.3315745Z Finished distributed/test_nvshmem_triton 1/1 ... [2025-12-04 11:22:28.331201][2232490.806228552], took 0.09min 2025-12-04T11:22:28.3316549Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:28.3331215Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:28.3333782Z Running distributed/test_cupy_as_tensor 1/1 ... [2025-12-04 11:22:28.333319][2232490.808349006] 2025-12-04T11:22:28.3333981Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:28.3335619Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_cupy_as_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:28.333482] 2025-12-04T11:22:33.7051503Z 2025-12-04T11:22:33.7052802Z distributed/test_cupy_as_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_cupy_as_tensor_1.1_1dd354665b59ba5a_.log 2025-12-04T11:22:33.7053736Z Running 1 items in this shard: test/distributed/test_cupy_as_tensor.py::CupyAsTensorTest::test_cupy_as_tensor 2025-12-04T11:22:33.7054110Z 2025-12-04T11:22:33.7054373Z Finished distributed/test_cupy_as_tensor 1/1 ... [2025-12-04 11:22:33.704836][2232496.179861576], took 0.09min 2025-12-04T11:22:33.7056207Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:33.7074741Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:33.7077157Z Running distributed/fsdp/test_fsdp_fx 1/1 ... [2025-12-04 11:22:33.707632][2232496.182661356] 2025-12-04T11:22:33.7077499Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:33.7079309Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:33.707809] 2025-12-04T11:22:36.5764269Z 2025-12-04T11:22:36.5765259Z distributed/fsdp/test_fsdp_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_fx_1.1_183b527fca7117df_.log 2025-12-04T11:22:36.5766525Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_fx.py::TestSymbolicTracingCUDA::test_symbolic_tracing_outputs_cuda 2025-12-04T11:22:36.5767086Z 2025-12-04T11:22:36.5767427Z Finished distributed/fsdp/test_fsdp_fx 1/1 ... [2025-12-04 11:22:36.576132][2232499.051157467], took 0.05min 2025-12-04T11:22:36.5769548Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:36.5788341Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:36.5791183Z Running distributed/_tools/test_sac_ilp 1/1 ... [2025-12-04 11:22:36.579001][2232499.054031265] 2025-12-04T11:22:36.5791515Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:36.5792793Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_tools/test_sac_ilp.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:36.579166] 2025-12-04T11:22:40.5490488Z 2025-12-04T11:22:40.5491434Z distributed/_tools/test_sac_ilp 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_sac_ilp_1.1_a24708da145734eb_.log 2025-12-04T11:22:40.5493473Z Running 4 items in this shard: test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case1, test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case2, test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case3, test/distributed/_tools/test_sac_ilp.py::TestOptimalCheckpointingPolicy::test_get_optimial_checkpointing_policy_per_module 2025-12-04T11:22:40.5494376Z 2025-12-04T11:22:40.5494605Z Finished distributed/_tools/test_sac_ilp 1/1 ... [2025-12-04 11:22:40.548699][2232503.023726037], took 0.07min 2025-12-04T11:22:40.5495357Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:40.5510547Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:40.5512518Z Running distributed/checkpoint/test_hf_storage 1/1 ... [2025-12-04 11:22:40.551155][2232503.026184945] 2025-12-04T11:22:40.5512814Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:40.5514498Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_hf_storage.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:40.551343] 2025-12-04T11:22:42.6187244Z 2025-12-04T11:22:42.6188261Z distributed/checkpoint/test_hf_storage 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_hf_storage_1.1_8abb1bc775cc6adf_.log 2025-12-04T11:22:42.6190452Z Running 5 items in this shard: test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_read_data_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_read_metadata_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_data_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_data_with_sharding, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_metadata_hf 2025-12-04T11:22:42.6191856Z 2025-12-04T11:22:42.6192163Z Finished distributed/checkpoint/test_hf_storage 1/1 ... [2025-12-04 11:22:42.618481][2232505.093505849], took 0.03min 2025-12-04T11:22:42.6193251Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:42.6212179Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:42.6214401Z Running distributed/pipelining/test_microbatch 1/1 ... [2025-12-04 11:22:42.621350][2232505.096379858] 2025-12-04T11:22:42.6214747Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:42.6216404Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/pipelining/test_microbatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:42.621523] 2025-12-04T11:22:56.7201511Z 2025-12-04T11:22:56.7202770Z distributed/pipelining/test_microbatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_microbatch_1.1_b3d363709b2b8e3d_.log 2025-12-04T11:22:56.7205695Z Running 5 items in this shard: test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_chunk_spec_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_and_merge_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_batch_size_one_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_none_cuda 2025-12-04T11:22:56.7207828Z 2025-12-04T11:22:56.7208135Z Finished distributed/pipelining/test_microbatch 1/1 ... [2025-12-04 11:22:56.719745][2232519.194772696], took 0.23min 2025-12-04T11:22:56.7209545Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:56.7221023Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:56.7223061Z Running distributed/tensor/test_placement_types 1/1 ... [2025-12-04 11:22:56.722218][2232519.197247623] 2025-12-04T11:22:56.7223323Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:56.7224779Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/test_placement_types.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:56.722376] 2025-12-04T11:22:58.7900111Z 2025-12-04T11:22:58.7901531Z distributed/tensor/test_placement_types 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_placement_types_1.1_a6dc7907fe092070_.log 2025-12-04T11:22:58.7905215Z Running 5 items in this shard: test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_dynamo_can_identify_placement_classes, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_equality, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_strided_shard_isinstance_shard, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_strided_shard_kwonly_argument, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_type_identification 2025-12-04T11:22:58.7907428Z 2025-12-04T11:22:58.7907776Z Finished distributed/tensor/test_placement_types 1/1 ... [2025-12-04 11:22:58.789651][2232521.264676209], took 0.03min 2025-12-04T11:22:58.7908681Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:22:58.7925027Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:22:58.7927957Z Running distributed/tensor/test_dtensor_dispatch_overhead 1/1 ... [2025-12-04 11:22:58.792631][2232521.267660915] 2025-12-04T11:22:58.7928457Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:22:58.7929550Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/test_dtensor_dispatch_overhead.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:58.792812] 2025-12-04T11:23:05.5675669Z 2025-12-04T11:23:05.5676856Z distributed/tensor/test_dtensor_dispatch_overhead 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_dispatch_overhead_1.1_c63c054e742fe598_.log 2025-12-04T11:23:05.5681752Z Running 1 items in this shard: test/distributed/tensor/test_dtensor_dispatch_overhead.py::DistOpDispatchOverHead::test_dtensor_add_op_dispatch_overhead 2025-12-04T11:23:05.5682451Z 2025-12-04T11:23:05.5682899Z Finished distributed/tensor/test_dtensor_dispatch_overhead 1/1 ... [2025-12-04 11:23:05.567236][2232528.042260893], took 0.11min 2025-12-04T11:23:05.5684214Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:23:05.5700623Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:23:05.5702966Z Running distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 ... [2025-12-04 11:23:05.570171][2232528.04520042] 2025-12-04T11:23:05.5703407Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:23:05.5705199Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_checkpoint_reader.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:23:05.570349] 2025-12-04T11:23:07.9384625Z 2025-12-04T11:23:07.9385873Z distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_checkpoint_reader_1.1_65e5a211d349c2dd_.log 2025-12-04T11:23:07.9389169Z Running 7 items in this shard: test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read_different_dtypes, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read_missing_keys, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_checkpoint, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_nonexistent_checkpoint, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_with_kwargs, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_with_map_location 2025-12-04T11:23:07.9392307Z 2025-12-04T11:23:07.9392641Z Finished distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 ... [2025-12-04 11:23:07.938081][2232530.413107484], took 0.04min 2025-12-04T11:23:07.9393580Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:23:07.9408207Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:23:07.9410385Z Running distributed/checkpoint/test_format_utils 1/1 ... [2025-12-04 11:23:07.940933][2232530.415962442] 2025-12-04T11:23:07.9410689Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:23:07.9412348Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_format_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:23:07.941106] 2025-12-04T11:23:19.1235500Z 2025-12-04T11:23:19.1236719Z distributed/checkpoint/test_format_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_format_utils_1.1_c2c3d785e742af52_.log 2025-12-04T11:23:19.1238759Z Running 3 items in this shard: test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_dcp_to_torch_save, test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_online_torch_save_to_dcp, test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_torch_save_to_dcp 2025-12-04T11:23:19.1240169Z 2025-12-04T11:23:19.1240607Z Finished distributed/checkpoint/test_format_utils 1/1 ... [2025-12-04 11:23:19.123210][2232541.598234317], took 0.19min 2025-12-04T11:23:19.1242776Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:23:19.1261461Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:23:19.1263925Z Running distributed/test_aten_comm_compute_reordering 1/1 ... [2025-12-04 11:23:19.126244][2232541.601274062] 2025-12-04T11:23:19.1264293Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:23:19.1265573Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_aten_comm_compute_reordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:23:19.126422] 2025-12-04T11:31:34.7891696Z 2025-12-04T11:31:34.7896854Z distributed/test_aten_comm_compute_reordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_aten_comm_compute_reordering_1.1_469ee7a1762e0de8_.log 2025-12-04T11:31:34.7912434Z Running 48 items in this shard: test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_custom_estimator_for_non_compute_nodes, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_inductor_default_comms_ordering, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_overlap_scheduling_via_config, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_basic_all_gather_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_basic_all_reduce_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucket_exposed_with_hidden_single_overlap, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap_blocking_deps_inductor, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap_blocking_no_deps, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_wait_sink, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_with_convert_dtype, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_collective_benchmarking_with_real_pg, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_custom_estimation_with_fake_tensor_mode, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_custom_estimator_for_non_compute_nodes, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_inductor_default_comms_ordering, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_multidtype_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_multiple_hiding_nodes_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_no_bucketing_when_collective_depends_on_hiding_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_no_bucketing_with_dependent_hiding_nodes, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_overlap_scheduling_via_config, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_reduce_scatter_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_sink_waits_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_no_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_single_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_single_bucket_custom_module_stack_fn, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_custom_estimator_for_non_compute_nodes, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_inductor_default_comms_ordering, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_make_graph_view_and_get_subgraph_by_path, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_make_graph_view_and_get_subgraph_by_path_custom_module_stack_fn, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_manual_reordering_bucketing_pass_separate_buckets, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_overlap_scheduling_via_config, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_sink_waits_raise_comms 2025-12-04T11:31:34.7923255Z 2025-12-04T11:31:34.7923437Z Finished distributed/test_aten_comm_compute_reordering 1/1 ... [2025-12-04 11:31:34.789284][2233037.26430845], took 8.26min 2025-12-04T11:31:34.7923924Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:31:34.7924313Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:31:34.7924571Z Running distributed/checkpoint/test_quantized_hf_storage 1/1 ... [2025-12-04 11:31:34.791914][2233037.26694361] 2025-12-04T11:31:34.7924791Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:31:34.7925216Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_quantized_hf_storage.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:34.792076] 2025-12-04T11:31:36.9601776Z 2025-12-04T11:31:36.9602621Z distributed/checkpoint/test_quantized_hf_storage 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_quantized_hf_storage_1.1_e9926b81113a3521_.log 2025-12-04T11:31:36.9604263Z Running 2 items in this shard: test/distributed/checkpoint/test_quantized_hf_storage.py::TestQuantizedHfStorage::test_dequantization, test/distributed/checkpoint/test_quantized_hf_storage.py::TestQuantizedHfStorage::test_dtensor_slice_dequantization_block_alignment 2025-12-04T11:31:36.9605180Z 2025-12-04T11:31:36.9605538Z Finished distributed/checkpoint/test_quantized_hf_storage 1/1 ... [2025-12-04 11:31:36.959824][2233039.434850955], took 0.04min 2025-12-04T11:31:36.9610217Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:31:36.9628406Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:31:36.9631328Z Running distributed/_composable/test_composability/test_pp_composability 1/1 ... [2025-12-04 11:31:36.962922][2233039.437951545] 2025-12-04T11:31:36.9631745Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:31:36.9632478Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/test_composability/test_pp_composability.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:36.963099] 2025-12-04T11:31:38.9309438Z 2025-12-04T11:31:38.9310754Z distributed/_composable/test_composability/test_pp_composability 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_composability.test_pp_composability_1.1_ec6ad63c42dbf4a8_.log 2025-12-04T11:31:38.9323150Z Running 26 items in this shard: test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass0_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass0_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass1_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass1_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass2_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass2_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass3_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass3_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass4_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_3d_with_tp_dp_pp_ScheduleClass4_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_pp_and_dcp, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass0_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass0_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass1_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass1_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass2_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass2_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass3_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass3_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass4_bfloat16, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_ScheduleClass4_float32, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_grads_ScheduleClass0, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_grads_ScheduleClass1, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_grads_ScheduleClass2, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_grads_ScheduleClass3, test/distributed/_composable/test_composability/test_pp_composability.py::ComposabilityTest::test_replicate_pp_grads_ScheduleClass4 2025-12-04T11:31:38.9330944Z 2025-12-04T11:31:38.9331228Z Finished distributed/_composable/test_composability/test_pp_composability 1/1 ... [2025-12-04 11:31:38.930589][2233041.405614804], took 0.03min 2025-12-04T11:31:38.9331978Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:31:38.9333941Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:31:38.9335990Z Running distributed/test_device_mesh 1/1 ... [2025-12-04 11:31:38.933494][2233041.408523568] 2025-12-04T11:31:38.9336275Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:31:38.9338126Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_device_mesh.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:38.933698] 2025-12-04T11:34:44.3017344Z 2025-12-04T11:34:44.3018276Z distributed/test_device_mesh 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_device_mesh_1.1_ea9ef44acfd5b09b_.log 2025-12-04T11:34:44.3031894Z Running 66 items in this shard: test/distributed/test_device_mesh.py::DeviceMeshTestGlooBackend::test_device_mesh_reuse_default_group, test/distributed/test_device_mesh.py::DeviceMeshSetDeviceTest::test_auto_set_device_from_heuristic, test/distributed/test_device_mesh.py::DeviceMeshSetDeviceTest::test_auto_set_device_from_local_rank, test/distributed/test_device_mesh.py::DeviceMeshSetDeviceTest::test_manual_set_device, test/distributed/test_device_mesh.py::DeviceMeshTest::test_2d_mesh_eager_init_subgroup, test/distributed/test_device_mesh.py::DeviceMeshTest::test_2d_mesh_non_eager_init_subgroup, test/distributed/test_device_mesh.py::DeviceMeshTest::test_assert_invalid_mesh_tensor, test/distributed/test_device_mesh.py::DeviceMeshTest::test_device_mesh_2d, test/distributed/test_device_mesh.py::DeviceMeshTest::test_device_mesh_init_backend, test/distributed/test_device_mesh.py::DeviceMeshTest::test_fake_pg_device_mesh, test/distributed/test_device_mesh.py::DeviceMeshTest::test_from_group_with_global_pg, test/distributed/test_device_mesh.py::DeviceMeshTest::test_from_group_with_invalid_mesh, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_group_and_get_all_groups, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_local_rank, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_local_rank_raises_exception, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_root_mesh_multiple_independent_meshes, test/distributed/test_device_mesh.py::DeviceMeshTest::test_init_process_group, test/distributed/test_device_mesh.py::DeviceMeshTest::test_raises_invalid_device_type, test/distributed/test_device_mesh.py::DeviceMeshTestNDim::test_device_mesh_hash, test/distributed/test_device_mesh.py::DeviceMeshTestNDim::test_device_mesh_nd, test/distributed/test_device_mesh.py::DeviceMeshTestNDim::test_device_mesh_parent_child_hash, test/distributed/test_device_mesh.py::DeviceMeshTestNDim::test_from_group_with_mesh_shape_2d, test/distributed/test_device_mesh.py::DeviceMeshTestNDim::test_from_group_with_mesh_shape_3d, test/distributed/test_device_mesh.py::DeviceMeshTestNDim::test_get_local_rank_3d, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_backend_override_argument_dict_with_idx_and_backend_eager, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_backend_override_argument_dict_with_idx_and_backend_lazy, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_backend_override_argument_dict_with_name_and_options, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_backend_override_argument_errors, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_init_device_mesh, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_raises_duplicate_mesh_dim_names, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_raises_mesh_shape_mesh_dim_names_mismatch, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_cache_and_reuse_submesh_slice_result, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_concatenate_2d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_concatenate_3d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_flatten_mesh_1d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_flatten_mesh_3d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_flatten_mesh_4d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_get_item_1d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_get_item_2d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_get_item_3d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_get_item_3d_noncontiguous_slicing, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_raises_invalid_mesh_dim_name, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_raises_no_mesh_dim_found, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_reconstruct_mesh_with_flatten_dim, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_unflatten_mesh_2d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_unflatten_mesh_3d, test/distributed/test_device_mesh.py::TestMeshEnv::test_get_all_submeshes, test/distributed/test_device_mesh.py::TestMeshEnv::test_get_mesh_dim_by_name, test/distributed/test_device_mesh.py::TestMeshEnv::test_get_root_mesh, test/distributed/test_device_mesh.py::TestMeshEnv::test_get_root_mesh_dim_exist, test/distributed/test_device_mesh.py::TestMeshEnv::test_get_root_mesh_dim_not_exist, test/distributed/test_device_mesh.py::TestMeshEnv::test_mesh_slice_fake_tensor_mode, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_all_gather_uneven, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_broadcast_1d, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_broadcast_nd, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_reduce_scatter_contiguous, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_reduce_scatter_uneven, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_scatter_1d, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_scatter_nd, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_scatter_uneven, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_check_non_overlap, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_coalesce, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_coalesce_non_coalescible, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_complement_n_group_layout, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_composition, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_remap_to_tensor 2025-12-04T11:34:44.3041693Z 2025-12-04T11:34:44.3041819Z Finished distributed/test_device_mesh 1/1 ... [2025-12-04 11:34:44.302330][2233226.777356222], took 3.09min 2025-12-04T11:34:44.3042268Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:34:44.3046496Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:34:44.3049215Z Running distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 11:34:44.304825][2233226.779854656] 2025-12-04T11:34:44.3049431Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:34:44.3051253Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_style.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:34:44.304992] 2025-12-04T11:35:41.4117175Z 2025-12-04T11:35:41.4118330Z distributed/tensor/parallel/test_tp_style 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_tp_style_1.1_4a09dcf07438a814_.log 2025-12-04T11:35:41.4123850Z Running 18 items in this shard: test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_sequence_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_sequence_parallel_style 2025-12-04T11:35:41.4127536Z 2025-12-04T11:35:41.4127687Z Finished distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 11:35:41.411453][2233283.886478619], took 0.95min 2025-12-04T11:35:41.4128143Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:35:41.4145096Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:35:41.4145346Z Running distributed/checkpoint/test_fsspec 1/1 ... [2025-12-04 11:35:41.414410][2233283.889439383] 2025-12-04T11:35:41.4145552Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:35:41.4147418Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_fsspec.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:35:41.414573] 2025-12-04T11:35:54.9978268Z 2025-12-04T11:35:54.9979422Z distributed/checkpoint/test_fsspec 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsspec_1.1_d665eabe23b91456_.log 2025-12-04T11:35:54.9981917Z Running 3 items in this shard: test/distributed/checkpoint/test_fsspec.py::TestFSSpec::test_fsspec, test/distributed/checkpoint/test_fsspec.py::TestFSSpec::test_overwrite, test/distributed/checkpoint/test_fsspec.py::TestFileSystem::test_remove_on_fail 2025-12-04T11:35:54.9982997Z 2025-12-04T11:35:54.9983374Z Finished distributed/checkpoint/test_fsspec 1/1 ... [2025-12-04 11:35:54.997548][2233297.472573456], took 0.23min 2025-12-04T11:35:54.9989254Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:35:55.0004249Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:35:55.0007373Z Running distributed/tensor/experimental/test_tp_transform 1/1 ... [2025-12-04 11:35:55.000521][2233297.47555117] 2025-12-04T11:35:55.0007875Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:35:55.0008812Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/experimental/test_tp_transform.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:35:55.000688] 2025-12-04T11:36:16.3003043Z 2025-12-04T11:36:16.3004020Z distributed/tensor/experimental/test_tp_transform 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.experimental.test_tp_transform_1.1_b66d3ad1de882687_.log 2025-12-04T11:36:16.3005691Z Running 3 items in this shard: test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_e2e, test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_no_bias, test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_with_uncovered_op 2025-12-04T11:36:16.3006731Z 2025-12-04T11:36:16.3010555Z Finished distributed/tensor/experimental/test_tp_transform 1/1 ... [2025-12-04 11:36:16.299938][2233318.774964446], took 0.35min 2025-12-04T11:36:16.3011421Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:36:16.3029088Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:36:16.3029680Z Running distributed/_composable/test_replicate_mixed_precision 1/1 ... [2025-12-04 11:36:16.302834][2233318.777864221] 2025-12-04T11:36:16.3030093Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:36:16.3031733Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/test_replicate_mixed_precision.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:36:16.303004] 2025-12-04T11:37:01.5390396Z 2025-12-04T11:37:01.5392012Z distributed/_composable/test_replicate_mixed_precision 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_replicate_mixed_precision_1.1_901c6531662eb964_.log 2025-12-04T11:37:01.5398088Z Running 9 items in this shard: test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionTraining::test_compute_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionTraining::test_grad_acc_with_reduce_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionTraining::test_reduce_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_clamp_reduce_dtype, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_dataclass_input, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_float16_on_one_submodule, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_norm_modules_bf16, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_norm_modules_fp16, test/distributed/_composable/test_replicate_mixed_precision.py::TestReplicateMixedPrecisionCasts::test_submodules_with_external_inputs 2025-12-04T11:37:01.5402509Z 2025-12-04T11:37:01.5402908Z Finished distributed/_composable/test_replicate_mixed_precision 1/1 ... [2025-12-04 11:37:01.538666][2233364.013691179], took 0.75min 2025-12-04T11:37:01.5403998Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:37:01.5416183Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:37:01.5418289Z Running distributed/_composable/fsdp/test_fully_shard_logging 1/1 ... [2025-12-04 11:37:01.541703][2233364.016732551] 2025-12-04T11:37:01.5418878Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:37:01.5419922Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_logging.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:01.541871] 2025-12-04T11:37:03.4271564Z 2025-12-04T11:37:03.4273377Z distributed/_composable/fsdp/test_fully_shard_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_logging_1.1_1f0d0f4366acbb72_.log 2025-12-04T11:37:03.4274468Z Running 0 items in this shard: 2025-12-04T11:37:03.4274705Z 2025-12-04T11:37:03.4275167Z Finished distributed/_composable/fsdp/test_fully_shard_logging 1/1 ... [2025-12-04 11:37:03.426809][2233365.901834782], took 0.03min 2025-12-04T11:37:03.4280313Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:37:03.4298046Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:37:03.4300186Z Running distributed/_composable/fsdp/test_fully_shard_ignore_params 1/1 ... [2025-12-04 11:37:03.429882][2233365.904911053] 2025-12-04T11:37:03.4300602Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:37:03.4302057Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_ignore_params.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:03.430059] 2025-12-04T11:37:13.4107531Z 2025-12-04T11:37:13.4108718Z distributed/_composable/fsdp/test_fully_shard_ignore_params 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_ignore_params_1.1_34aca2666f0aacd2_.log 2025-12-04T11:37:13.4110199Z Running 1 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_ignore_params.py::TestFullyShardIgnoreParams::test_ddp_A_fsdp_B_ddp_C 2025-12-04T11:37:13.4110685Z 2025-12-04T11:37:13.4111033Z Finished distributed/_composable/fsdp/test_fully_shard_ignore_params 1/1 ... [2025-12-04 11:37:13.410417][2233375.885442558], took 0.17min 2025-12-04T11:37:13.4117160Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:37:13.4136025Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:37:13.4137288Z Running distributed/tensor/test_embedding_ops 1/1 ... [2025-12-04 11:37:13.413529][2233375.888558748] 2025-12-04T11:37:13.4137629Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:37:13.4138725Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/test_embedding_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:13.413696] 2025-12-04T11:37:36.1147458Z 2025-12-04T11:37:36.1148575Z distributed/tensor/test_embedding_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_embedding_ops_1.1_a76960e1f3372f3b_.log 2025-12-04T11:37:36.1160425Z Running 8 items in this shard: test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOp::test_multiple_embeddings_rowwise, test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOp::test_sharded_embedding_colwise, test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOp::test_sharded_embedding_colwise_max_norm_errors, test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOp::test_sharded_embedding_rowwise, test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOpWithLocalTensor::test_multiple_embeddings_rowwise, test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOpWithLocalTensor::test_sharded_embedding_colwise, test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOpWithLocalTensor::test_sharded_embedding_colwise_max_norm_errors, test/distributed/tensor/test_embedding_ops.py::TestEmbeddingOpWithLocalTensor::test_sharded_embedding_rowwise 2025-12-04T11:37:36.1164170Z 2025-12-04T11:37:36.1164511Z Finished distributed/tensor/test_embedding_ops 1/1 ... [2025-12-04 11:37:36.114445][2233398.589471073], took 0.38min 2025-12-04T11:37:36.1165558Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:37:36.1176163Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:37:36.1177290Z Running distributed/checkpoint/test_fsdp_optim_state 1/1 ... [2025-12-04 11:37:36.117539][2233398.592569134] 2025-12-04T11:37:36.1177669Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:37:36.1179207Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_optim_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:36.117711] 2025-12-04T11:37:53.9106459Z 2025-12-04T11:37:53.9107623Z distributed/checkpoint/test_fsdp_optim_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsdp_optim_state_1.1_458e95b808d71f2a_.log 2025-12-04T11:37:53.9109981Z Running 2 items in this shard: test/distributed/checkpoint/test_fsdp_optim_state.py::FsdpOptimStateCheckpoint::test_load_sharded_optimizer_state_dict_pass_planner_False, test/distributed/checkpoint/test_fsdp_optim_state.py::FsdpOptimStateCheckpoint::test_load_sharded_optimizer_state_dict_pass_planner_True 2025-12-04T11:37:53.9116965Z 2025-12-04T11:37:53.9118084Z Finished distributed/checkpoint/test_fsdp_optim_state 1/1 ... [2025-12-04 11:37:53.910518][2233416.385544018], took 0.30min 2025-12-04T11:37:53.9119101Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:37:53.9133417Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:37:53.9135890Z Running distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 ... [2025-12-04 11:37:53.913414][2233416.388443753] 2025-12-04T11:37:53.9136297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:37:53.9138022Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/e2e/test_e2e_save_and_load.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:53.913587] 2025-12-04T11:40:26.5684343Z 2025-12-04T11:40:26.5685040Z distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.e2e.test_e2e_save_and_load_1.1_f7394b30020829e8_.log 2025-12-04T11:40:26.5688749Z Running 19 items in this shard: test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_different_ordered_state_dict_keys, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type0_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type2_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type4_zoc_True, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type5_zoc_True, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_True_async_checkpointer_type1_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_True_async_checkpointer_type3_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type0, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type1, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type2, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type0, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type1, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type2, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_no_dist, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_overwrite, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_partial_load, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_stateful_and_non_stateful_loads, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestNoCPU::test_no_cpu, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestInitStateDict::test_init_state_dict 2025-12-04T11:40:26.5692538Z 2025-12-04T11:40:26.5692703Z Finished distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 ... [2025-12-04 11:40:26.568642][2233569.043666916], took 2.54min 2025-12-04T11:40:26.5693955Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:40:26.5707939Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:40:26.5708183Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T11:40:26.5708377Z Uploading artifacts took 0.00 seconds 2025-12-04T11:40:26.5711792Z Running distributed/_tools/test_runtime_estimator 1/1 ... [2025-12-04 11:40:26.570988][2233569.046017704] 2025-12-04T11:40:26.5712100Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:40:26.5713101Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_tools/test_runtime_estimator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:40:26.571152] 2025-12-04T11:41:06.9472484Z 2025-12-04T11:41:06.9473685Z distributed/_tools/test_runtime_estimator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_runtime_estimator_1.1_dd6d321696145fc7_.log 2025-12-04T11:41:06.9474654Z Running 2 items in this shard: test/distributed/_tools/test_runtime_estimator.py::TestRuntimeEstimator::test_conv_model_runtime, test/distributed/_tools/test_runtime_estimator.py::TestRuntimeEstimator::test_transformer_runtime 2025-12-04T11:41:06.9475178Z 2025-12-04T11:41:06.9475388Z Finished distributed/_tools/test_runtime_estimator 1/1 ... [2025-12-04 11:41:06.946872][2233609.421896898], took 0.67min 2025-12-04T11:41:06.9482447Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:41:06.9501080Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:41:06.9501615Z Running distributed/fsdp/test_fsdp_memory 1/1 ... [2025-12-04 11:41:06.950008][2233609.425038148] 2025-12-04T11:41:06.9502025Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:41:06.9503519Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:06.950180] 2025-12-04T11:41:33.8100028Z 2025-12-04T11:41:33.8103888Z distributed/fsdp/test_fsdp_memory 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_memory_1.1_17c65618f6fa37a6_.log 2025-12-04T11:41:33.8105035Z Running 2 items in this shard: test/distributed/fsdp/test_fsdp_memory.py::TestFSDPMemory::test_fsdp_memory_ckpt_ckpt, test/distributed/fsdp/test_fsdp_memory.py::TestFSDPMemory::test_fsdp_memory_ckpt_no_ckpt 2025-12-04T11:41:33.8105638Z 2025-12-04T11:41:33.8105938Z Finished distributed/fsdp/test_fsdp_memory 1/1 ... [2025-12-04 11:41:33.809619][2233636.284646427], took 0.45min 2025-12-04T11:41:33.8106785Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:41:33.8121283Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:41:33.8122082Z Running distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 11:41:33.812035][2233636.287065413] 2025-12-04T11:41:33.8122414Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:41:33.8123486Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/test_pointwise_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:33.812204] 2025-12-04T11:41:44.2869101Z 2025-12-04T11:41:44.2870221Z PRINTING LOG FILE of distributed/tensor/test_pointwise_ops 1/1 (test/test-reports/distributed.tensor.test_pointwise_ops_1.1_0fbe5820c1431077_.log) 2025-12-04T11:41:44.2871506Z Test results will be stored in test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-623c7108069e5e74.xml 2025-12-04T11:41:44.2872367Z ============================= test session starts ============================== 2025-12-04T11:41:44.2872984Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:41:44.2873537Z cachedir: .pytest_cache 2025-12-04T11:41:44.2874164Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:41:44.2874833Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:41:44.2875165Z configfile: pytest.ini 2025-12-04T11:41:44.2875806Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:41:44.2876478Z collecting ... collected 18 items 2025-12-04T11:41:44.2877602Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:41:44.2882784Z Running 18 items in this shard: test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_replicate_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_replicate_add 2025-12-04T11:41:44.2887938Z 2025-12-04T11:41:44.2888140Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_activations PASSED [0.5338s] [ 5%] 2025-12-04T11:41:44.2888724Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout SKIPPED [0.0002s] (testing RNG based ops is broken: https://github.com/pytorch/PiPPy/issues/494) [ 11%] 2025-12-04T11:41:44.2889304Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward PASSED [0.0360s] [ 16%] 2025-12-04T11:41:44.2889803Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_errors PASSED [0.0259s] [ 22%] 2025-12-04T11:41:44.2890275Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_inplace_op_partial_to_replicate PASSED [0.0173s] [ 27%] 2025-12-04T11:41:44.2890728Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_out PASSED [0.0122s] [ 33%] 2025-12-04T11:41:44.2891145Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_partial PASSED [0.0679s] [ 38%] 2025-12-04T11:41:44.2891573Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_add PASSED [0.0080s] [ 44%] 2025-12-04T11:41:44.2892031Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_replicate_add PASSED [0.0246s] [ 50%] 2025-12-04T11:41:44.2892520Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_activations PASSED [0.1165s] [ 55%] 2025-12-04T11:41:44.2893201Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout SKIPPED [0.0001s] (testing RNG based ops is broken: https://github.com/pytorch/PiPPy/issues/494) [ 61%] 2025-12-04T11:41:44.2893841Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_backward PASSED [0.0309s] [ 66%] 2025-12-04T11:41:44.2896993Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_errors PASSED [0.0144s] [ 72%] 2025-12-04T11:41:44.2897625Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_inplace_op_partial_to_replicate PASSED [0.0273s] [ 77%] 2025-12-04T11:41:44.2898060Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_out PASSED [0.0357s] [ 83%] 2025-12-04T11:41:44.2898450Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial FAILED [0.0092s] [ 88%] 2025-12-04T11:41:44.2898672Z 2025-12-04T11:41:44.2898748Z =================================== FAILURES =================================== 2025-12-04T11:41:44.2898971Z ____________ DistElementwiseOpsTestWithLocalTensor.test_mul_partial ____________ 2025-12-04T11:41:44.2899180Z Traceback (most recent call last): 2025-12-04T11:41:44.2899435Z File "/var/lib/jenkins/pytorch/test/distributed/tensor/test_pointwise_ops.py", line 320, in test_mul_partial 2025-12-04T11:41:44.2899773Z d_1 = DTensor.from_local(torch.ones(2, 2), device_mesh, [Partial()]) 2025-12-04T11:41:44.2899953Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2900225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 441, in from_local 2025-12-04T11:41:44.2900568Z return _FromTorchTensor.apply( # pyre-ignore[16]: autograd func 2025-12-04T11:41:44.2900759Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2901001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 583, in apply 2025-12-04T11:41:44.2901272Z return super().apply(*args, **kwargs) # type: ignore[misc] 2025-12-04T11:41:44.2901440Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2901691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 160, in forward 2025-12-04T11:41:44.2901947Z if device_mesh.get_coordinate() is None: 2025-12-04T11:41:44.2902091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2902364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/_local_tensor/__init__.py", line 1494, in get_coordinate 2025-12-04T11:41:44.2902665Z assert lm is not None, "Unexpectedly not in LocalTensorMode" 2025-12-04T11:41:44.2902826Z ^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2902966Z AssertionError: Unexpectedly not in LocalTensorMode 2025-12-04T11:41:44.2903077Z 2025-12-04T11:41:44.2903160Z To execute this test, run the following from the base repo dir: 2025-12-04T11:41:44.2903525Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/tensor/test_pointwise_ops.py DistElementwiseOpsTestWithLocalTensor.test_mul_partial 2025-12-04T11:41:44.2903813Z 2025-12-04T11:41:44.2903914Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:41:44.2904333Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-623c7108069e5e74.xml - 2025-12-04T11:41:44.2904714Z =========================== short test summary info ============================ 2025-12-04T11:41:44.2905072Z FAILED [0.0092s] distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial - AssertionError: Unexpectedly not in LocalTensorMode 2025-12-04T11:41:44.2905359Z 2025-12-04T11:41:44.2905438Z To execute this test, run the following from the base repo dir: 2025-12-04T11:41:44.2905794Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/tensor/test_pointwise_ops.py DistElementwiseOpsTestWithLocalTensor.test_mul_partial 2025-12-04T11:41:44.2906076Z 2025-12-04T11:41:44.2906176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:41:44.2906389Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:41:44.2906577Z =================== 1 failed, 13 passed, 2 skipped in 0.98s ==================== 2025-12-04T11:41:44.2906736Z Got exit code 1 2025-12-04T11:41:44.2906842Z Retrying single test... 2025-12-04T11:41:44.2907196Z Test results will be stored in test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-e3db1fdc4835f70e.xml 2025-12-04T11:41:44.2907526Z ============================= test session starts ============================== 2025-12-04T11:41:44.2907740Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:41:44.2907930Z cachedir: .pytest_cache 2025-12-04T11:41:44.2908153Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:41:44.2908393Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:41:44.2908512Z configfile: pytest.ini 2025-12-04T11:41:44.2908738Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:41:44.2909010Z collecting ... collected 18 items / 17 deselected / 1 selected 2025-12-04T11:41:44.2909337Z stepcurrent: skipping 15 already run items. Running only test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial 2025-12-04T11:41:44.2909656Z Running 1 items in this shard 2025-12-04T11:41:44.2909767Z 2025-12-04T11:41:44.2909937Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial FAILED [0.3263s] [100%] 2025-12-04T11:41:44.2910137Z 2025-12-04T11:41:44.2910197Z =================================== FAILURES =================================== 2025-12-04T11:41:44.2910389Z ____________ DistElementwiseOpsTestWithLocalTensor.test_mul_partial ____________ 2025-12-04T11:41:44.2910575Z Traceback (most recent call last): 2025-12-04T11:41:44.2910801Z File "/var/lib/jenkins/pytorch/test/distributed/tensor/test_pointwise_ops.py", line 320, in test_mul_partial 2025-12-04T11:41:44.2911062Z d_1 = DTensor.from_local(torch.ones(2, 2), device_mesh, [Partial()]) 2025-12-04T11:41:44.2911226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2911467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 441, in from_local 2025-12-04T11:41:44.2911739Z return _FromTorchTensor.apply( # pyre-ignore[16]: autograd func 2025-12-04T11:41:44.2911905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2912119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 583, in apply 2025-12-04T11:41:44.2912358Z return super().apply(*args, **kwargs) # type: ignore[misc] 2025-12-04T11:41:44.2912509Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2912744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 160, in forward 2025-12-04T11:41:44.2912974Z if device_mesh.get_coordinate() is None: 2025-12-04T11:41:44.2913101Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2913346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/_local_tensor/__init__.py", line 1494, in get_coordinate 2025-12-04T11:41:44.2913628Z assert lm is not None, "Unexpectedly not in LocalTensorMode" 2025-12-04T11:41:44.2913774Z ^^^^^^^^^^^^^^ 2025-12-04T11:41:44.2913904Z AssertionError: Unexpectedly not in LocalTensorMode 2025-12-04T11:41:44.2914003Z 2025-12-04T11:41:44.2914078Z To execute this test, run the following from the base repo dir: 2025-12-04T11:41:44.2914412Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/tensor/test_pointwise_ops.py DistElementwiseOpsTestWithLocalTensor.test_mul_partial 2025-12-04T11:41:44.2914670Z 2025-12-04T11:41:44.2914759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:41:44.2915136Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-e3db1fdc4835f70e.xml - 2025-12-04T11:41:44.2915537Z =========================== short test summary info ============================ 2025-12-04T11:41:44.2915866Z FAILED [0.3263s] distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial - AssertionError: Unexpectedly not in LocalTensorMode 2025-12-04T11:41:44.2916129Z 2025-12-04T11:41:44.2916204Z To execute this test, run the following from the base repo dir: 2025-12-04T11:41:44.2916531Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/tensor/test_pointwise_ops.py DistElementwiseOpsTestWithLocalTensor.test_mul_partial 2025-12-04T11:41:44.2916789Z 2025-12-04T11:41:44.2916877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:41:44.2917063Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:41:44.2917225Z ======================= 1 failed, 17 deselected in 0.34s ======================= 2025-12-04T11:41:44.2917365Z Got exit code 1 2025-12-04T11:41:44.2917467Z Retrying single test... 2025-12-04T11:41:44.2917774Z Test results will be stored in test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-b5af8afd1d357194.xml 2025-12-04T11:41:44.2918073Z ============================= test session starts ============================== 2025-12-04T11:41:44.2918281Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:41:44.2918468Z cachedir: .pytest_cache 2025-12-04T11:41:44.2918690Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:41:44.2918929Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:41:44.2919047Z configfile: pytest.ini 2025-12-04T11:41:44.2919272Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:41:44.2919542Z collecting ... collected 18 items / 17 deselected / 1 selected 2025-12-04T11:41:44.2919907Z stepcurrent: skipping 15 already run items. Running only test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial 2025-12-04T11:41:44.2920199Z Running 1 items in this shard 2025-12-04T11:41:44.2920274Z 2025-12-04T11:41:44.2920440Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial PASSED [0.6304s] [100%] 2025-12-04T11:41:44.2920643Z 2025-12-04T11:41:44.2920895Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-b5af8afd1d357194.xml - 2025-12-04T11:41:44.2921242Z ======================= 1 passed, 17 deselected in 0.64s ======================= 2025-12-04T11:41:44.2921557Z [W1204 11:41:41.082134795 ProcessGroup.cpp:367] Warning: At the time of process termination, there are still 8 unwaited collective calls. Please review your program to ensure that: 2025-12-04T11:41:44.2921923Z 1. c10d_functional.wait_tensor() is invoked on all tensors returned from c10d_functional collective, 2025-12-04T11:41:44.2922296Z 2. c10d_functional.wait_tensor() is invoked on all output tensors of async_op=True torch.distributed collective called under `with allow_inflight_collective_as_graph_input_ctx():`, 2025-12-04T11:41:44.2922643Z before the output tensors of the collective are used. (function ~WorkRegistry) 2025-12-04T11:41:44.2922809Z Got exit code 0 2025-12-04T11:41:44.2922945Z Test succeeded in new process, continuing with the rest of the tests 2025-12-04T11:41:44.2923267Z Test results will be stored in test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-281800cbb34ef3ed.xml 2025-12-04T11:41:44.2923565Z ============================= test session starts ============================== 2025-12-04T11:41:44.2923772Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:41:44.2923995Z cachedir: .pytest_cache 2025-12-04T11:41:44.2924219Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:41:44.2924457Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:41:44.2924575Z configfile: pytest.ini 2025-12-04T11:41:44.2924799Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:41:44.2925068Z collecting ... collected 18 items / 16 deselected / 2 selected 2025-12-04T11:41:44.2925230Z stepcurrent: skipping 16 already run items. 2025-12-04T11:41:44.2925361Z Running 2 items in this shard 2025-12-04T11:41:44.2925435Z 2025-12-04T11:41:44.2925602Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_add PASSED [0.3712s] [ 50%] 2025-12-04T11:41:44.2925983Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_replicate_add PASSED [0.1924s] [100%] 2025-12-04T11:41:44.2926200Z 2025-12-04T11:41:44.2926483Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-281800cbb34ef3ed.xml - 2025-12-04T11:41:44.2926829Z ======================= 2 passed, 16 deselected in 0.57s ======================= 2025-12-04T11:41:44.2927147Z [W1204 11:41:44.596583091 ProcessGroup.cpp:367] Warning: At the time of process termination, there are still 4 unwaited collective calls. Please review your program to ensure that: 2025-12-04T11:41:44.2927508Z 1. c10d_functional.wait_tensor() is invoked on all tensors returned from c10d_functional collective, 2025-12-04T11:41:44.2927876Z 2. c10d_functional.wait_tensor() is invoked on all output tensors of async_op=True torch.distributed collective called under `with allow_inflight_collective_as_graph_input_ctx():`, 2025-12-04T11:41:44.2928219Z before the output tensors of the collective are used. (function ~WorkRegistry) 2025-12-04T11:41:44.2928587Z The following tests failed and then succeeded when run in a new process['test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial'] 2025-12-04T11:41:44.2928853Z 2025-12-04T11:41:44.2929061Z FINISHED PRINTING LOG FILE of distributed/tensor/test_pointwise_ops 1/1 (test/test-reports/distributed.tensor.test_pointwise_ops_1.1_0fbe5820c1431077_.log) 2025-12-04T11:41:44.2929300Z 2025-12-04T11:41:44.2929435Z Finished distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 11:41:44.286579][2233646.761603667], took 0.17min 2025-12-04T11:41:44.2929912Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:41:44.2930307Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:41:44.2930557Z Running distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 11:41:44.289648][2233646.764677789] 2025-12-04T11:41:44.2930771Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:41:44.2931190Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_compatibility.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:44.289827] 2025-12-04T11:41:46.5080083Z 2025-12-04T11:41:46.5080953Z distributed/checkpoint/test_compatibility 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_compatibility_1.1_6290d74e7153c6d5_.log 2025-12-04T11:41:46.5082855Z Running 4 items in this shard: test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_metadata, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_sharded_tensor_dependency, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_storage_meta, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_with_v_2_3 2025-12-04T11:41:46.5083848Z 2025-12-04T11:41:46.5084070Z Finished distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 11:41:46.507660][2233648.982686848], took 0.04min 2025-12-04T11:41:46.5089494Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:41:46.5104088Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:41:46.5106037Z Running distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 11:41:46.510460][2233648.985489886] 2025-12-04T11:41:46.5106293Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:41:46.5107416Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_tools/test_mem_tracker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:46.510606] 2025-12-04T11:41:53.4420679Z 2025-12-04T11:41:53.4421486Z distributed/_tools/test_mem_tracker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_mem_tracker_1.1_6203452a99bfd746_.log 2025-12-04T11:41:53.4422624Z Running 3 items in this shard: test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_accelerator_tracker_equivalence, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_attribution, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_with_activation_checkpointing 2025-12-04T11:41:53.4423357Z 2025-12-04T11:41:53.4423572Z Finished distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 11:41:53.441729][2233655.916756076], took 0.12min 2025-12-04T11:41:53.4429440Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:41:53.4446880Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:41:53.4448185Z Running distributed/elastic/test_control_plane 1/1 ... [2025-12-04 11:41:53.444626][2233655.919655552] 2025-12-04T11:41:53.4448471Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:41:53.4449200Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/elastic/test_control_plane.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:53.444797] 2025-12-04T11:41:55.8630118Z 2025-12-04T11:41:55.8631287Z distributed/elastic/test_control_plane 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.test_control_plane_1.1_1a1f699f40696ae1_.log 2025-12-04T11:41:55.8635727Z Running 10 items in this shard: test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_json, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_params, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_traceback, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_names, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_nonexistant, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_run_handler, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_tcp, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_wait_counter_values, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_worker_server 2025-12-04T11:41:55.8638321Z 2025-12-04T11:41:55.8639086Z Finished distributed/elastic/test_control_plane 1/1 ... [2025-12-04 11:41:55.862642][2233658.337668109], took 0.04min 2025-12-04T11:41:55.8639992Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:41:55.8658716Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:41:55.8659929Z Running distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 11:41:55.865828][2233658.340857659] 2025-12-04T11:41:55.8660231Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:41:55.8661663Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_overlap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:55.866001] 2025-12-04T11:43:10.4392654Z 2025-12-04T11:43:10.4396169Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_83469f07f30a7891_.log) 2025-12-04T11:43:10.4398012Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-7abdd4c66af90b2a.xml 2025-12-04T11:43:10.4398644Z ============================= test session starts ============================== 2025-12-04T11:43:10.4399105Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:43:10.4399503Z cachedir: .pytest_cache 2025-12-04T11:43:10.4400046Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:43:10.4400557Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:43:10.4400815Z configfile: pytest.ini 2025-12-04T11:43:10.4401292Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:43:10.4401799Z collecting ... collected 1 item 2025-12-04T11:43:10.4402103Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:43:10.4402722Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T11:43:10.4403169Z 2025-12-04T11:43:10.4403824Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 11:41:57.571000 207482 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 207551 2025-12-04T11:43:10.4405191Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T11:43:10.4405960Z _init_core_state( 2025-12-04T11:43:10.4407290Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T11:43:10.4408268Z _warn_cpu_init() 2025-12-04T11:43:10.4408577Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:43:10.4409090Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:43:10.4409874Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4410738Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:43:10.4411467Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4412140Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:43:10.4412803Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4413493Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:43:10.4414195Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4414926Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:43:10.4415620Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4416294Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:43:10.4416968Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4417645Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:43:10.4418453Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4419186Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:43:10.4419590Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4420317Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4420908Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:43:10.4421330Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4421827Z [rank0]:E1204 11:42:17.853000 207551 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:43:10.4422107Z dist init r=0, world=1 2025-12-04T11:43:10.4422188Z 2025-12-04T11:43:10.4422237Z rank0: 2025-12-04T11:43:10.4422468Z e1: {'cpu_iter': 0.0006606836000001337, 'cpu_wait': 3.231699999979298e-05, 'gpu_compute': 0.017271899990737437, 'gpu_total': 0.267126002907753} 2025-12-04T11:43:10.4422891Z e2: {'cpu_iter': 0.0015144582999994326, 'cpu_wait': 1.9200000000019203e-05, 'gpu_compute': 0.03600710011087358, 'gpu_total': 0.6206221044063568} 2025-12-04T11:43:10.4423264Z e3: {'cpu_iter': 0.001286529299999728, 'cpu_wait': 0.39536339550000027, 'gpu_compute': 396.76391143798827, 'gpu_total': 397.0451965332031} 2025-12-04T11:43:10.4423621Z e4: {'cpu_iter': 0.0026327003999998765, 'cpu_wait': 0.7501441684, 'gpu_compute': 396.72158279418943, 'gpu_total': 397.1461456298828} 2025-12-04T11:43:10.4424216Z [rank0]:[W1204 11:42:18.528088953 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:43:10.4424697Z FAILED [21.6203s] [100%] 2025-12-04T11:43:10.4424771Z 2025-12-04T11:43:10.4424843Z =================================== FAILURES =================================== 2025-12-04T11:43:10.4425070Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________ 2025-12-04T11:43:10.4425286Z Traceback (most recent call last): 2025-12-04T11:43:10.4425582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:43:10.4425909Z self._join_processes(fn) 2025-12-04T11:43:10.4426194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:43:10.4426498Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:43:10.4426811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:43:10.4427115Z raise RuntimeError(error) 2025-12-04T11:43:10.4427274Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:43:10.4427436Z Traceback (most recent call last): 2025-12-04T11:43:10.4427674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4427918Z getattr(self, test_name)() 2025-12-04T11:43:10.4428150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4428383Z fn() 2025-12-04T11:43:10.4428586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4428815Z method(*args, **kwargs) 2025-12-04T11:43:10.4429036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4429264Z method(*args, **kwargs) 2025-12-04T11:43:10.4429481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4429762Z with policy(): 2025-12-04T11:43:10.4429974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4430202Z raise RuntimeError(msg) 2025-12-04T11:43:10.4430615Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4430991Z 2025-12-04T11:43:10.4431065Z To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4431403Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4431667Z 2025-12-04T11:43:10.4431756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4431881Z 2025-12-04T11:43:10.4431883Z 2025-12-04T11:43:10.4431962Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:43:10.4432196Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:43:10.4432572Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-7abdd4c66af90b2a.xml - 2025-12-04T11:43:10.4432910Z =========================== short test summary info ============================ 2025-12-04T11:43:10.4433255Z FAILED [21.6203s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:43:10.4433582Z Traceback (most recent call last): 2025-12-04T11:43:10.4433829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4434070Z getattr(self, test_name)() 2025-12-04T11:43:10.4434303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4434563Z fn() 2025-12-04T11:43:10.4434761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4434990Z method(*args, **kwargs) 2025-12-04T11:43:10.4435210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4435437Z method(*args, **kwargs) 2025-12-04T11:43:10.4435653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4435879Z with policy(): 2025-12-04T11:43:10.4436087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4436315Z raise RuntimeError(msg) 2025-12-04T11:43:10.4436726Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4437106Z 2025-12-04T11:43:10.4437182Z To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4437517Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4437776Z 2025-12-04T11:43:10.4437866Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4438053Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:43:10.4438209Z ============================== 1 failed in 21.78s ============================== 2025-12-04T11:43:10.4438339Z Got exit code 1 2025-12-04T11:43:10.4438437Z Retrying single test... 2025-12-04T11:43:10.4438699Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-387e5416ae452882.xml 2025-12-04T11:43:10.4438990Z ============================= test session starts ============================== 2025-12-04T11:43:10.4439200Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:43:10.4439387Z cachedir: .pytest_cache 2025-12-04T11:43:10.4439607Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:43:10.4439881Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:43:10.4439999Z configfile: pytest.ini 2025-12-04T11:43:10.4440224Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:43:10.4440463Z collecting ... collected 1 item 2025-12-04T11:43:10.4440808Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T11:43:10.4441104Z Running 1 items in this shard 2025-12-04T11:43:10.4441175Z 2025-12-04T11:43:10.4441481Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 11:42:21.444000 207634 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 207703 2025-12-04T11:43:10.4442119Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T11:43:10.4442482Z _init_core_state( 2025-12-04T11:43:10.4443118Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T11:43:10.4443784Z _warn_cpu_init() 2025-12-04T11:43:10.4443985Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:43:10.4444322Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:43:10.4444809Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4445288Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:43:10.4445767Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4446212Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:43:10.4446650Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4447113Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:43:10.4447577Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4448037Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:43:10.4448499Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4448946Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:43:10.4449397Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4449894Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:43:10.4450580Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4451200Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:43:10.4451551Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4452138Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4452643Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:43:10.4453034Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4453444Z [rank0]:E1204 11:42:41.826000 207703 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:43:10.4453685Z dist init r=0, world=1 2025-12-04T11:43:10.4453750Z 2025-12-04T11:43:10.4453787Z rank0: 2025-12-04T11:43:10.4453986Z e1: {'cpu_iter': 0.0007067017000002452, 'cpu_wait': 1.776499999976977e-05, 'gpu_compute': 0.017191799962893127, 'gpu_total': 0.2791459023952484} 2025-12-04T11:43:10.4454312Z e2: {'cpu_iter': 0.0016124603000001515, 'cpu_wait': 1.802500000014362e-05, 'gpu_compute': 0.03632709993980825, 'gpu_total': 0.6383976995944977} 2025-12-04T11:43:10.4454631Z e3: {'cpu_iter': 0.001382821399999834, 'cpu_wait': 0.3956417296000005, 'gpu_compute': 397.07031707763673, 'gpu_total': 397.3831848144531} 2025-12-04T11:43:10.4454940Z e4: {'cpu_iter': 0.002960046800000171, 'cpu_wait': 0.7506494168999993, 'gpu_compute': 397.1181816101074, 'gpu_total': 397.5279571533203} 2025-12-04T11:43:10.4455450Z [rank0]:[W1204 11:42:42.627813770 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:43:10.4455858Z FAILED [21.9205s] [100%] 2025-12-04T11:43:10.4455926Z 2025-12-04T11:43:10.4455982Z =================================== FAILURES =================================== 2025-12-04T11:43:10.4456179Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________ 2025-12-04T11:43:10.4456362Z Traceback (most recent call last): 2025-12-04T11:43:10.4456609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:43:10.4456853Z self._join_processes(fn) 2025-12-04T11:43:10.4457096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:43:10.4457358Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:43:10.4457622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:43:10.4457879Z raise RuntimeError(error) 2025-12-04T11:43:10.4458030Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:43:10.4458189Z Traceback (most recent call last): 2025-12-04T11:43:10.4458426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4458666Z getattr(self, test_name)() 2025-12-04T11:43:10.4458940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4459172Z fn() 2025-12-04T11:43:10.4459374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4459602Z method(*args, **kwargs) 2025-12-04T11:43:10.4459872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4460100Z method(*args, **kwargs) 2025-12-04T11:43:10.4460317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4460540Z with policy(): 2025-12-04T11:43:10.4460750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4460978Z raise RuntimeError(msg) 2025-12-04T11:43:10.4461392Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4461799Z 2025-12-04T11:43:10.4461874Z To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4462209Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4462472Z 2025-12-04T11:43:10.4462559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4462685Z 2025-12-04T11:43:10.4462687Z 2025-12-04T11:43:10.4462766Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:43:10.4462967Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:43:10.4463337Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-387e5416ae452882.xml - 2025-12-04T11:43:10.4463677Z =========================== short test summary info ============================ 2025-12-04T11:43:10.4464024Z FAILED [21.9205s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:43:10.4464352Z Traceback (most recent call last): 2025-12-04T11:43:10.4464595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4464840Z getattr(self, test_name)() 2025-12-04T11:43:10.4465073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4465304Z fn() 2025-12-04T11:43:10.4465508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4465740Z method(*args, **kwargs) 2025-12-04T11:43:10.4465959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4466188Z method(*args, **kwargs) 2025-12-04T11:43:10.4466407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4466632Z with policy(): 2025-12-04T11:43:10.4466842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4467073Z raise RuntimeError(msg) 2025-12-04T11:43:10.4467515Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4467892Z 2025-12-04T11:43:10.4467967Z To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4468301Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4468561Z 2025-12-04T11:43:10.4468650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4468835Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:43:10.4468992Z ============================== 1 failed in 22.08s ============================== 2025-12-04T11:43:10.4469121Z Got exit code 1 2025-12-04T11:43:10.4469220Z Retrying single test... 2025-12-04T11:43:10.4469487Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-6bbd7f74149e204d.xml 2025-12-04T11:43:10.4469837Z ============================= test session starts ============================== 2025-12-04T11:43:10.4470047Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:43:10.4470236Z cachedir: .pytest_cache 2025-12-04T11:43:10.4470458Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:43:10.4470693Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:43:10.4470812Z configfile: pytest.ini 2025-12-04T11:43:10.4471037Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:43:10.4471277Z collecting ... collected 1 item 2025-12-04T11:43:10.4471571Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T11:43:10.4471868Z Running 1 items in this shard 2025-12-04T11:43:10.4471941Z 2025-12-04T11:43:10.4472252Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 11:42:45.638000 207786 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 207855 2025-12-04T11:43:10.4472885Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T11:43:10.4473253Z _init_core_state( 2025-12-04T11:43:10.4473888Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T11:43:10.4474527Z _warn_cpu_init() 2025-12-04T11:43:10.4474728Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:43:10.4475068Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:43:10.4475555Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4476034Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:43:10.4476540Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4476989Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:43:10.4477426Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4477887Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:43:10.4478350Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4478809Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:43:10.4479300Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4479814Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:43:10.4480269Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4480733Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:43:10.4481393Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4482012Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:43:10.4482364Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4482951Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4483453Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:43:10.4483818Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4484230Z [rank0]:E1204 11:43:05.942000 207855 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:43:10.4484470Z dist init r=0, world=1 2025-12-04T11:43:10.4484533Z 2025-12-04T11:43:10.4484570Z rank0: 2025-12-04T11:43:10.4484773Z e1: {'cpu_iter': 0.0006614596999998668, 'cpu_wait': 3.0742000000216764e-05, 'gpu_compute': 0.017255499912425875, 'gpu_total': 0.2598341077566147} 2025-12-04T11:43:10.4485104Z e2: {'cpu_iter': 0.0015938174000002191, 'cpu_wait': 1.8791000000106804e-05, 'gpu_compute': 0.03488710015080869, 'gpu_total': 0.6186420977115631} 2025-12-04T11:43:10.4485455Z e3: {'cpu_iter': 0.0012687551000002627, 'cpu_wait': 0.39608086109999974, 'gpu_compute': 397.32046127319336, 'gpu_total': 397.59949951171876} 2025-12-04T11:43:10.4485773Z e4: {'cpu_iter': 0.0026688292999995865, 'cpu_wait': 0.7513549821000002, 'gpu_compute': 397.25318298339846, 'gpu_total': 397.6144287109375} 2025-12-04T11:43:10.4486291Z [rank0]:[W1204 11:43:06.594871444 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:43:10.4486699Z FAILED [21.6232s] [100%] 2025-12-04T11:43:10.4486767Z 2025-12-04T11:43:10.4486823Z =================================== FAILURES =================================== 2025-12-04T11:43:10.4487018Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________ 2025-12-04T11:43:10.4487203Z Traceback (most recent call last): 2025-12-04T11:43:10.4487449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:43:10.4487694Z self._join_processes(fn) 2025-12-04T11:43:10.4487971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:43:10.4488233Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:43:10.4488498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:43:10.4488755Z raise RuntimeError(error) 2025-12-04T11:43:10.4488906Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:43:10.4489071Z Traceback (most recent call last): 2025-12-04T11:43:10.4489309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4489550Z getattr(self, test_name)() 2025-12-04T11:43:10.4489820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4490053Z fn() 2025-12-04T11:43:10.4490254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4490485Z method(*args, **kwargs) 2025-12-04T11:43:10.4490705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4490934Z method(*args, **kwargs) 2025-12-04T11:43:10.4491153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4491379Z with policy(): 2025-12-04T11:43:10.4491588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4491819Z raise RuntimeError(msg) 2025-12-04T11:43:10.4492229Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4492609Z 2025-12-04T11:43:10.4492683Z To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4493019Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4493280Z 2025-12-04T11:43:10.4493367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4493492Z 2025-12-04T11:43:10.4493494Z 2025-12-04T11:43:10.4493574Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:43:10.4493772Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:43:10.4494169Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-6bbd7f74149e204d.xml - 2025-12-04T11:43:10.4494509Z =========================== short test summary info ============================ 2025-12-04T11:43:10.4494853Z FAILED [21.6232s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:43:10.4495179Z Traceback (most recent call last): 2025-12-04T11:43:10.4495421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:43:10.4495663Z getattr(self, test_name)() 2025-12-04T11:43:10.4495896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:43:10.4496126Z fn() 2025-12-04T11:43:10.4496329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4496588Z method(*args, **kwargs) 2025-12-04T11:43:10.4496807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:43:10.4497035Z method(*args, **kwargs) 2025-12-04T11:43:10.4497252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:43:10.4497477Z with policy(): 2025-12-04T11:43:10.4497688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:43:10.4497919Z raise RuntimeError(msg) 2025-12-04T11:43:10.4498334Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992. 2025-12-04T11:43:10.4498711Z 2025-12-04T11:43:10.4498790Z To execute this test, run the following from the base repo dir: 2025-12-04T11:43:10.4499126Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T11:43:10.4499386Z 2025-12-04T11:43:10.4499476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:43:10.4499663Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:43:10.4499872Z ============================== 1 failed in 21.78s ============================== 2025-12-04T11:43:10.4500004Z Got exit code 1 2025-12-04T11:43:10.4500236Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T11:43:10.4500572Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:43:10.4500938Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-f46632d38cf25471.xml 2025-12-04T11:43:10.4501229Z ============================= test session starts ============================== 2025-12-04T11:43:10.4501438Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:43:10.4501627Z cachedir: .pytest_cache 2025-12-04T11:43:10.4501850Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:43:10.4502088Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:43:10.4502208Z configfile: pytest.ini 2025-12-04T11:43:10.4502433Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:43:10.4502698Z collecting ... collected 1 item / 1 deselected / 0 selected 2025-12-04T11:43:10.4502891Z stepcurrent: skipping 1 already run items. 2025-12-04T11:43:10.4503023Z Running 0 items in this shard 2025-12-04T11:43:10.4503095Z 2025-12-04T11:43:10.4503337Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-f46632d38cf25471.xml - 2025-12-04T11:43:10.4503671Z ============================ 1 deselected in 0.00s ============================= 2025-12-04T11:43:10.4503973Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda'] 2025-12-04T11:43:10.4504212Z 2025-12-04T11:43:10.4504403Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_83469f07f30a7891_.log) 2025-12-04T11:43:10.4504632Z 2025-12-04T11:43:10.4504760Z Finished distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 11:43:10.439168][2233732.914193443], took 1.24min 2025-12-04T11:43:10.4505197Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:43:10.4505619Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:43:10.4505838Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T11:43:10.4506017Z Uploading artifacts took 0.00 seconds 2025-12-04T11:43:10.4506156Z distributed/fsdp/test_fsdp_overlap 1/1 failed! 2025-12-04T11:43:10.4506343Z Running distributed/test_fake_pg 1/1 ... [2025-12-04 11:43:10.442268][2233732.917297985] 2025-12-04T11:43:10.4506524Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:43:10.4506907Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_fake_pg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:10.442437] 2025-12-04T11:43:16.2152314Z 2025-12-04T11:43:16.2153130Z distributed/test_fake_pg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_fake_pg_1.1_8297dee1d92e23e5_.log 2025-12-04T11:43:16.2155889Z Running 16 items in this shard: test/distributed/test_fake_pg.py::TestFakePG::test_all_reduce, test/distributed/test_fake_pg.py::TestFakePG::test_allgather, test/distributed/test_fake_pg.py::TestFakePG::test_alltoall, test/distributed/test_fake_pg.py::TestFakePG::test_alltoall_base, test/distributed/test_fake_pg.py::TestFakePG::test_broadcast, test/distributed/test_fake_pg.py::TestFakePG::test_construct_fsdp, test/distributed/test_fake_pg.py::TestFakePG::test_error_on_collective, test/distributed/test_fake_pg.py::TestFakePG::test_fake_pg_tracing, test/distributed/test_fake_pg.py::TestFakePG::test_fake_process_group_direct_usage_error, test/distributed/test_fake_pg.py::TestFakePG::test_fake_process_group_proper_usage_dispatch, test/distributed/test_fake_pg.py::TestFakePG::test_fsdp_fake_e2e, test/distributed/test_fake_pg.py::TestFakePG::test_fsdp_tp_fake_e2e, test/distributed/test_fake_pg.py::TestFakePG::test_recv, test/distributed/test_fake_pg.py::TestFakePG::test_reduce_scatter, test/distributed/test_fake_pg.py::TestFakePG::test_scatter, test/distributed/test_fake_pg.py::TestFakePG::test_send 2025-12-04T11:43:16.2158366Z 2025-12-04T11:43:16.2158552Z Finished distributed/test_fake_pg 1/1 ... [2025-12-04 11:43:16.214836][2233738.689863065], took 0.10min 2025-12-04T11:43:16.2159209Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:43:16.2173464Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:43:16.2174013Z Running distributed/checkpoint/test_fsdp_model_state 1/1 ... [2025-12-04 11:43:16.217295][2233738.692324911] 2025-12-04T11:43:16.2175001Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:43:16.2176459Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_model_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:16.217468] 2025-12-04T11:43:32.2064759Z 2025-12-04T11:43:32.2065605Z distributed/checkpoint/test_fsdp_model_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsdp_model_state_1.1_47721b90dc3a8ae1_.log 2025-12-04T11:43:32.2066557Z Running 2 items in this shard: test/distributed/checkpoint/test_fsdp_model_state.py::FsdpModelStateCheckpoint::test_fsdp_model_state_no_resharding, test/distributed/checkpoint/test_fsdp_model_state.py::FsdpModelStateCheckpoint::test_fsdp_model_state_with_resharding 2025-12-04T11:43:32.2067123Z 2025-12-04T11:43:32.2067360Z Finished distributed/checkpoint/test_fsdp_model_state 1/1 ... [2025-12-04 11:43:32.206209][2233754.6812357], took 0.27min 2025-12-04T11:43:32.2076942Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:43:32.2091094Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:43:32.2094759Z Running distributed/fsdp/test_utils 1/1 ... [2025-12-04 11:43:32.209234][2233754.684264474] 2025-12-04T11:43:32.2095172Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:43:32.2095969Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:32.209401] 2025-12-04T11:43:35.2786735Z 2025-12-04T11:43:35.2787839Z distributed/fsdp/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_utils_1.1_e293aefe28d75870_.log 2025-12-04T11:43:35.2790517Z Running 5 items in this shard: test/distributed/fsdp/test_utils.py::TestUtilsCUDA::test_apply_to_tensors_cpu_cuda_cuda, test/distributed/fsdp/test_utils.py::TestUtilsCUDA::test_apply_to_tensors_device_list0_cuda, test/distributed/fsdp/test_utils.py::TestUtilsCUDA::test_apply_to_tensors_device_list1_cuda, test/distributed/fsdp/test_utils.py::TestUtilsCUDA::test_packed_sequence_cuda, test/distributed/fsdp/test_utils.py::TestUtilsCUDA::test_replace_by_prefix_cuda 2025-12-04T11:43:35.2792350Z 2025-12-04T11:43:35.2792699Z Finished distributed/fsdp/test_utils 1/1 ... [2025-12-04 11:43:35.278358][2233757.753383621], took 0.05min 2025-12-04T11:43:35.2796958Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:43:35.2816280Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:43:35.2817792Z Running distributed/tensor/parallel/test_tp_examples 1/1 ... [2025-12-04 11:43:35.281533][2233757.756563271] 2025-12-04T11:43:35.2818250Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:43:35.2819121Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_examples.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:35.281702] 2025-12-04T11:45:36.1436610Z 2025-12-04T11:45:36.1438145Z distributed/tensor/parallel/test_tp_examples 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_tp_examples_1.1_915a7f0dad38edab_.log 2025-12-04T11:45:36.1447241Z Running 16 items in this shard: test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_loss_parallel, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_mlp_inference, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_mlp_training_is_seq_parallel_False_recompute_activation_False, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_mlp_training_is_seq_parallel_True_recompute_activation_False, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_float64_thaw_all, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_seq_parallel_float32_thaw_all, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_seq_parallel_float32_thaw_layers_0_attention_wv__layers_0_feed_forward_w1__layers_1_feed_forward_w2__layers_1_ffn_norm__output__tok_embeddings, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_seq_parallel_float32_thaw_layers_1_ffn_norm__norm__output__tok_embeddings, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_seq_parallel_float32_thaw_norm__output, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_seq_parallel_float32_thaw_norm__output__tok_embeddings, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_req_grad_seq_parallel_float32_thaw_output__tok_embeddings, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_training_is_seq_parallel_False_float32, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_training_is_seq_parallel_False_float64, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_training_is_seq_parallel_True_float32, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_transformer_training_is_seq_parallel_True_float64, test/distributed/tensor/parallel/test_tp_examples.py::DistTensorParallelExampleTest::test_weight_tying 2025-12-04T11:45:36.1453672Z 2025-12-04T11:45:36.1453881Z Finished distributed/tensor/parallel/test_tp_examples 1/1 ... [2025-12-04 11:45:36.143378][2233878.618402991], took 2.01min 2025-12-04T11:45:36.1454492Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:45:36.1465409Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:45:36.1467525Z Running distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 ... [2025-12-04 11:45:36.146651][2233878.621680959] 2025-12-04T11:45:36.1467796Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:45:36.1469342Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:45:36.146815] 2025-12-04T11:46:01.6055100Z 2025-12-04T11:46:01.6056715Z distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_clip_grad_norm__1.1_00d947abe0dbee12_.log 2025-12-04T11:46:01.6058842Z Running 2 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py::TestClipGradNormWorldSize2::test_clip_grad_norm_1d, test/distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py::TestClipGradNormWorldSize4::test_clip_grad_norm_2d 2025-12-04T11:46:01.6060004Z 2025-12-04T11:46:01.6061130Z Finished distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 ... [2025-12-04 11:46:01.605128][2233904.080153262], took 0.42min 2025-12-04T11:46:01.6066463Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:46:01.6083103Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:46:01.6085364Z Running distributed/tensor/debug/test_comm_mode 1/1 ... [2025-12-04 11:46:01.608415][2233904.08344465] 2025-12-04T11:46:01.6085733Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:46:01.6087288Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/debug/test_comm_mode.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:01.608583] 2025-12-04T11:46:05.7798479Z 2025-12-04T11:46:05.7800221Z distributed/tensor/debug/test_comm_mode 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.debug.test_comm_mode_1.1_b911b3ae2ef9863b_.log 2025-12-04T11:46:05.7803209Z Running 4 items in this shard: test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_coalesced, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_with_c10d, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_with_dtensor 2025-12-04T11:46:05.7804734Z 2025-12-04T11:46:05.7805131Z Finished distributed/tensor/debug/test_comm_mode 1/1 ... [2025-12-04 11:46:05.779466][2233908.254490603], took 0.07min 2025-12-04T11:46:05.7807219Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:46:05.7823207Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:46:05.7823762Z Running distributed/test_dist2 1/1 ... [2025-12-04 11:46:05.782241][2233908.257271042] 2025-12-04T11:46:05.7824055Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:46:05.7826072Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_dist2.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:05.782411] 2025-12-04T11:47:43.0457341Z 2025-12-04T11:47:43.0458599Z distributed/test_dist2 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_dist2_1.1_7301507b52a4e77f_.log 2025-12-04T11:47:43.0469000Z Running 34 items in this shard: test/distributed/test_dist2.py::ProcessGroupTest::test_context_manager, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_allgather, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_allreduce, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_alltoall_base, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_barrier, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_broadcast, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_gather, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_group_split, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_reduce, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_reduce_scatter, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_remote_group_merge, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_scatter, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_allgather, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_allreduce, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_alltoall_base, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_barrier, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_broadcast, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_gather, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_group_split, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_reduce, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_reduce_scatter, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_remote_group_merge, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_scatter, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_allgather, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_allreduce, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_alltoall_base, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_barrier, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_broadcast, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_gather, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_group_split, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_reduce, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_reduce_scatter, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_remote_group_merge, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_scatter 2025-12-04T11:47:43.0475637Z 2025-12-04T11:47:43.0475849Z Finished distributed/test_dist2 1/1 ... [2025-12-04 11:47:43.045418][2234005.520443796], took 1.62min 2025-12-04T11:47:43.0476622Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:47:43.0482999Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:47:43.0485688Z Running distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 ... [2025-12-04 11:47:43.048443][2234005.52347263] 2025-12-04T11:47:43.0485999Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:47:43.0487435Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_grad_scaler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:47:43.048615] 2025-12-04T11:47:55.7834372Z 2025-12-04T11:47:55.7835882Z distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_grad_scaler_1.1_237b4bfe3abcc602_.log 2025-12-04T11:47:55.7837515Z Running 1 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_grad_scaler.py::TestFullyShardGradientScaler::test_gradient_scaler 2025-12-04T11:47:55.7838092Z 2025-12-04T11:47:55.7838506Z Finished distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 ... [2025-12-04 11:47:55.783110][2234018.258134988], took 0.21min 2025-12-04T11:47:55.7847869Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:47:55.7864883Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:47:55.7869109Z Running distributed/launcher/test_run 1/1 ... [2025-12-04 11:47:55.786591][2234018.261620882] 2025-12-04T11:47:55.7869473Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:47:55.7870358Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/launcher/test_run.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:47:55.786754] 2025-12-04T11:48:43.5265753Z 2025-12-04T11:48:43.5266870Z distributed/launcher/test_run 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.launcher.test_run_1.1_47f7f2ad4ce03101_.log 2025-12-04T11:48:43.5274701Z Running 26 items in this shard: test/distributed/launcher/test_run.py::ElasticLaunchTest::test_capture_logs_using_default_logs_specs, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_init_method_env_with_torchelastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_init_method_tcp_with_torchelastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_not_torchelastic_launched, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_torchelastic_launched, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_torchelastic_launched_with_logs_spec_defined, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_agent_raise_exception, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_multiple_agents, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_worker_raise_exception, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_run_path, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_shutdown, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_standalone, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_bash, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_default_nproc, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_python, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_python_caffe2_bc, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_with_env_vars, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_logs_logs_spec_entrypoint_must_be_defined, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_min_max_nodes_parse, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_gpu_launch_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_auto_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_number_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_unknown_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_xpu_launch_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_virtual_local_rank 2025-12-04T11:48:43.5281532Z 2025-12-04T11:48:43.5281711Z Finished distributed/launcher/test_run 1/1 ... [2025-12-04 11:48:43.526558][2234066.001583459], took 0.80min 2025-12-04T11:48:43.5282305Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:48:43.5296420Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:48:43.5298620Z Running distributed/fsdp/test_fsdp_backward_prefetch 1/1 ... [2025-12-04 11:48:43.529768][2234066.004798309] 2025-12-04T11:48:43.5298877Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:48:43.5300752Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_backward_prefetch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:48:43.529939] 2025-12-04T11:48:53.4105225Z 2025-12-04T11:48:53.4106884Z distributed/fsdp/test_fsdp_backward_prefetch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_backward_prefetch_1.1_df031cd9b44eb6f1_.log 2025-12-04T11:48:53.4107728Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_backward_prefetch.py::TestBackwardPrefetch::test_backward_prefetch 2025-12-04T11:48:53.4108674Z 2025-12-04T11:48:53.4109137Z Finished distributed/fsdp/test_fsdp_backward_prefetch 1/1 ... [2025-12-04 11:48:53.410062][2234075.885087941], took 0.16min 2025-12-04T11:48:53.4117484Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:48:53.4133323Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:48:53.4134542Z Running distributed/fsdp/test_fsdp_pure_fp16 1/1 ... [2025-12-04 11:48:53.413308][2234075.88833721] 2025-12-04T11:48:53.4134863Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:48:53.4136305Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_pure_fp16.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:48:53.413478] 2025-12-04T11:50:01.1741222Z 2025-12-04T11:50:01.1741838Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 1/1 (test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_409903cf23c40aa5_.log) 2025-12-04T11:50:01.1742929Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-15b41a24d60809a9.xml 2025-12-04T11:50:01.1743988Z ============================= test session starts ============================== 2025-12-04T11:50:01.1744476Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:50:01.1744888Z cachedir: .pytest_cache 2025-12-04T11:50:01.1745366Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:50:01.1745892Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:50:01.1746136Z configfile: pytest.ini 2025-12-04T11:50:01.1746604Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:50:01.1747112Z collecting ... collected 2 items 2025-12-04T11:50:01.1747404Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:50:01.1748197Z Running 2 items in this shard: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda, test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T11:50:01.1748821Z 2025-12-04T11:50:01.1749375Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 11:48:55.080000 221802 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 221871 2025-12-04T11:50:01.1750247Z I1204 11:48:55.081000 221802 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 221872 2025-12-04T11:50:01.1750840Z I1204 11:48:55.081000 221802 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 221873 2025-12-04T11:50:01.1751417Z I1204 11:48:55.082000 221802 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 221874 2025-12-04T11:50:01.1752592Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1753607Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1754623Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1755603Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1756731Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1757732Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1758711Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1759657Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1760214Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T11:50:01.1760730Z return func(*args, **kwargs) 2025-12-04T11:50:01.1761013Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1761455Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1762179Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1762794Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1763412Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1763999Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1764565Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1765150Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1765746Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1766332Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1766931Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1767507Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1768091Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1768675Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1769508Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 2300575744 and is now 4039114752. 2025-12-04T11:50:01.1770238Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1770598Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1771175Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1771652Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1772025Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1772483Z [rank2]:E1204 11:49:03.196000 221873 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T11:50:01.1772733Z dist init r=2, world=4 2025-12-04T11:50:01.1772942Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1773288Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1773787Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1774279Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1774769Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1775227Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1775675Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1776148Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1776621Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1777095Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1777568Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1778027Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1778489Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1778989Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1779606Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 2453667840 and is now 4192206848. 2025-12-04T11:50:01.1780215Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1780562Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1781109Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1781600Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1781964Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1782379Z [rank0]:E1204 11:49:03.206000 221871 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:50:01.1782622Z dist init r=0, world=4 2025-12-04T11:50:01.1782822Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1783156Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1783642Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1784121Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1784595Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1785041Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1785483Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1785945Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1786406Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1786870Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1787329Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1787780Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1788271Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1788733Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1789343Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 2250244096 and is now 3988783104. 2025-12-04T11:50:01.1789975Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1790325Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1790898Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1791359Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1791721Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1792132Z [rank3]:E1204 11:49:03.246000 221874 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T11:50:01.1792372Z dist init r=3, world=4 2025-12-04T11:50:01.1792575Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1792912Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1793399Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1793880Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1794354Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1794803Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1795242Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1795705Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1796168Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1796632Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1797123Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1797575Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1798028Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1798493Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1799108Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 2317352960 and is now 4055891968. 2025-12-04T11:50:01.1799750Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1800096Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1800637Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1801098Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1801462Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1801880Z [rank1]:E1204 11:49:03.282000 221872 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:50:01.1802122Z dist init r=1, world=4 2025-12-04T11:50:01.1802535Z [rank0]:[W1204 11:49:03.944609600 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:50:01.1802945Z FAILED [10.2146s] [ 50%] 2025-12-04T11:50:01.1803014Z 2025-12-04T11:50:01.1803074Z =================================== FAILURES =================================== 2025-12-04T11:50:01.1803256Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________ 2025-12-04T11:50:01.1803422Z Traceback (most recent call last): 2025-12-04T11:50:01.1803670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:50:01.1803914Z self._join_processes(fn) 2025-12-04T11:50:01.1804163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:50:01.1804426Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:50:01.1804691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:50:01.1804951Z raise RuntimeError(error) 2025-12-04T11:50:01.1805105Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.1805267Z Traceback (most recent call last): 2025-12-04T11:50:01.1805507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1805747Z getattr(self, test_name)() 2025-12-04T11:50:01.1806018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1806251Z fn() 2025-12-04T11:50:01.1806455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1806685Z method(*args, **kwargs) 2025-12-04T11:50:01.1806907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1807135Z method(*args, **kwargs) 2025-12-04T11:50:01.1807353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1807579Z with policy(): 2025-12-04T11:50:01.1807792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1808023Z raise RuntimeError(msg) 2025-12-04T11:50:01.1808396Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 2453667840 and is now 4192206848. 2025-12-04T11:50:01.1808763Z 2025-12-04T11:50:01.1808838Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1809141Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1809367Z 2025-12-04T11:50:01.1809458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1809586Z 2025-12-04T11:50:01.1809644Z Process 2 exited with error code 10 and exception: 2025-12-04T11:50:01.1809840Z Traceback (most recent call last): 2025-12-04T11:50:01.1810082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1810329Z getattr(self, test_name)() 2025-12-04T11:50:01.1810564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1810799Z fn() 2025-12-04T11:50:01.1811000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1811229Z method(*args, **kwargs) 2025-12-04T11:50:01.1811448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1811677Z method(*args, **kwargs) 2025-12-04T11:50:01.1811896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1812122Z with policy(): 2025-12-04T11:50:01.1812332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1812563Z raise RuntimeError(msg) 2025-12-04T11:50:01.1812939Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 2300575744 and is now 4039114752. 2025-12-04T11:50:01.1813281Z 2025-12-04T11:50:01.1813355Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1813651Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1813874Z 2025-12-04T11:50:01.1813962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1814086Z 2025-12-04T11:50:01.1814087Z 2025-12-04T11:50:01.1814169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:50:01.1814370Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:50:01.1814770Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-15b41a24d60809a9.xml - 2025-12-04T11:50:01.1815113Z =========================== short test summary info ============================ 2025-12-04T11:50:01.1815420Z FAILED [10.2146s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.1815709Z Traceback (most recent call last): 2025-12-04T11:50:01.1815951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1816194Z getattr(self, test_name)() 2025-12-04T11:50:01.1816426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1816659Z fn() 2025-12-04T11:50:01.1816861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1817127Z method(*args, **kwargs) 2025-12-04T11:50:01.1817347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1817575Z method(*args, **kwargs) 2025-12-04T11:50:01.1817791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1818014Z with policy(): 2025-12-04T11:50:01.1818226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1818455Z raise RuntimeError(msg) 2025-12-04T11:50:01.1818829Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 2453667840 and is now 4192206848. 2025-12-04T11:50:01.1819168Z 2025-12-04T11:50:01.1819248Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1819543Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1819803Z 2025-12-04T11:50:01.1819896Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1820017Z 2025-12-04T11:50:01.1820078Z Process 2 exited with error code 10 and exception: 2025-12-04T11:50:01.1820218Z Traceback (most recent call last): 2025-12-04T11:50:01.1820460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1820701Z getattr(self, test_name)() 2025-12-04T11:50:01.1820932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1821167Z fn() 2025-12-04T11:50:01.1821367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1821599Z method(*args, **kwargs) 2025-12-04T11:50:01.1821817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1822047Z method(*args, **kwargs) 2025-12-04T11:50:01.1822263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1822488Z with policy(): 2025-12-04T11:50:01.1822697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1822928Z raise RuntimeError(msg) 2025-12-04T11:50:01.1823327Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 2300575744 and is now 4039114752. 2025-12-04T11:50:01.1823665Z 2025-12-04T11:50:01.1823738Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1824035Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1824257Z 2025-12-04T11:50:01.1824344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1824531Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:50:01.1824689Z ============================== 1 failed in 10.37s ============================== 2025-12-04T11:50:01.1824822Z Got exit code 1 2025-12-04T11:50:01.1824921Z Retrying single test... 2025-12-04T11:50:01.1825188Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f08520c07221ce7a.xml 2025-12-04T11:50:01.1825509Z ============================= test session starts ============================== 2025-12-04T11:50:01.1825720Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:50:01.1825907Z cachedir: .pytest_cache 2025-12-04T11:50:01.1826128Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:50:01.1826366Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:50:01.1826487Z configfile: pytest.ini 2025-12-04T11:50:01.1826713Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:50:01.1826983Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T11:50:01.1827270Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda 2025-12-04T11:50:01.1827534Z Running 1 items in this shard 2025-12-04T11:50:01.1827613Z 2025-12-04T11:50:01.1827878Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 11:49:07.806000 222204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 222273 2025-12-04T11:50:01.1828329Z I1204 11:49:07.807000 222204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 222274 2025-12-04T11:50:01.1828668Z I1204 11:49:07.808000 222204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 222275 2025-12-04T11:50:01.1829006Z I1204 11:49:07.808000 222204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 222276 2025-12-04T11:50:01.1829726Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1830320Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1830902Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1831481Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1832089Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1832671Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1833248Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1833824Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1834210Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T11:50:01.1834578Z return func(*args, **kwargs) 2025-12-04T11:50:01.1834830Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1835171Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1835659Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1836141Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1836618Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1837067Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1837507Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1837968Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1838431Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1838891Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1839351Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1839846Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1840302Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1840765Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1841414Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 2317352960 and is now 4055891968. 2025-12-04T11:50:01.1841991Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1842341Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1842885Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1843345Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1843709Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1844153Z [rank1]:E1204 11:49:15.771000 222274 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:50:01.1844394Z dist init r=1, world=4 2025-12-04T11:50:01.1844599Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1844934Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1845417Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1845894Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1846373Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1846822Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1847261Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1847720Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1848184Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1848645Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1849107Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1849559Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1850064Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1850580Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1851194Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 2300575744 and is now 4039114752. 2025-12-04T11:50:01.1851766Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1852113Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1852658Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1853153Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1853516Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1853926Z [rank2]:E1204 11:49:15.776000 222275 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T11:50:01.1854166Z dist init r=2, world=4 2025-12-04T11:50:01.1854368Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1854703Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1855186Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1855667Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1856144Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1856588Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1857025Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1857486Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1857947Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1858407Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1858865Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1859313Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1859870Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1860334Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1860944Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 2453667840 and is now 4192206848. 2025-12-04T11:50:01.1861514Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1861858Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1862404Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1862890Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1863253Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1863664Z [rank0]:E1204 11:49:15.778000 222273 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:50:01.1863905Z dist init r=0, world=4 2025-12-04T11:50:01.1864108Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1864444Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1864926Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1865402Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1865877Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1866324Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1866764Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1867226Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1867687Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1868148Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1868607Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1869080Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1869536Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1870030Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1870641Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 2250244096 and is now 3988783104. 2025-12-04T11:50:01.1871214Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1871594Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1872141Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1872601Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1872963Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1873382Z [rank3]:E1204 11:49:15.794000 222276 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T11:50:01.1873623Z dist init r=3, world=4 2025-12-04T11:50:01.1874022Z [rank0]:[W1204 11:49:16.527761958 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:50:01.1874431Z FAILED [10.0159s] [100%] 2025-12-04T11:50:01.1874501Z 2025-12-04T11:50:01.1874560Z =================================== FAILURES =================================== 2025-12-04T11:50:01.1874740Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________ 2025-12-04T11:50:01.1874906Z Traceback (most recent call last): 2025-12-04T11:50:01.1875150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:50:01.1875396Z self._join_processes(fn) 2025-12-04T11:50:01.1875643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:50:01.1875906Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:50:01.1876175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:50:01.1876434Z raise RuntimeError(error) 2025-12-04T11:50:01.1876585Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T11:50:01.1876746Z Traceback (most recent call last): 2025-12-04T11:50:01.1876988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1877227Z getattr(self, test_name)() 2025-12-04T11:50:01.1877458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1877688Z fn() 2025-12-04T11:50:01.1877917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1878152Z method(*args, **kwargs) 2025-12-04T11:50:01.1878373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1878603Z method(*args, **kwargs) 2025-12-04T11:50:01.1878820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1879045Z with policy(): 2025-12-04T11:50:01.1879256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1879486Z raise RuntimeError(msg) 2025-12-04T11:50:01.1879917Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 2300575744 and is now 4039114752. 2025-12-04T11:50:01.1880286Z 2025-12-04T11:50:01.1880361Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1880659Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1880881Z 2025-12-04T11:50:01.1880971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1881095Z 2025-12-04T11:50:01.1881097Z 2025-12-04T11:50:01.1881177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:50:01.1881377Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:50:01.1881743Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f08520c07221ce7a.xml - 2025-12-04T11:50:01.1882083Z =========================== short test summary info ============================ 2025-12-04T11:50:01.1882386Z FAILED [10.0159s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T11:50:01.1882671Z Traceback (most recent call last): 2025-12-04T11:50:01.1882915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1883156Z getattr(self, test_name)() 2025-12-04T11:50:01.1883390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1883620Z fn() 2025-12-04T11:50:01.1883819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1884049Z method(*args, **kwargs) 2025-12-04T11:50:01.1884268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1884499Z method(*args, **kwargs) 2025-12-04T11:50:01.1884715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1884941Z with policy(): 2025-12-04T11:50:01.1885149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1885380Z raise RuntimeError(msg) 2025-12-04T11:50:01.1885748Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 2300575744 and is now 4039114752. 2025-12-04T11:50:01.1886085Z 2025-12-04T11:50:01.1886162Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1886492Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1886718Z 2025-12-04T11:50:01.1886806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1886993Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:50:01.1887157Z ======================= 1 failed, 1 deselected in 10.17s ======================= 2025-12-04T11:50:01.1887295Z Got exit code 1 2025-12-04T11:50:01.1887392Z Retrying single test... 2025-12-04T11:50:01.1887656Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-4413e31fe02281ab.xml 2025-12-04T11:50:01.1887948Z ============================= test session starts ============================== 2025-12-04T11:50:01.1888156Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:50:01.1888342Z cachedir: .pytest_cache 2025-12-04T11:50:01.1888563Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:50:01.1888827Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:50:01.1888946Z configfile: pytest.ini 2025-12-04T11:50:01.1889171Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:50:01.1889437Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T11:50:01.1889781Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda 2025-12-04T11:50:01.1890037Z Running 1 items in this shard 2025-12-04T11:50:01.1890112Z 2025-12-04T11:50:01.1890380Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 11:49:20.253000 222606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 222675 2025-12-04T11:50:01.1890835Z I1204 11:49:20.254000 222606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 222676 2025-12-04T11:50:01.1891180Z I1204 11:49:20.254000 222606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 222677 2025-12-04T11:50:01.1891519Z I1204 11:49:20.255000 222606 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 222678 2025-12-04T11:50:01.1892198Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1892780Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1893365Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1893943Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1894518Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1895091Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1895895Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:50:01.1896473Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:50:01.1896858Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T11:50:01.1897225Z return func(*args, **kwargs) 2025-12-04T11:50:01.1897441Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1897782Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1898302Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1898781Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1899260Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1899740Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1900180Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1900648Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1901110Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1901572Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1902034Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1902483Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1902935Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1903397Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1904012Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 2300575744 and is now 4039114752. 2025-12-04T11:50:01.1904588Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1904962Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1905506Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1905968Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1906334Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1906749Z [rank2]:E1204 11:49:28.385000 222677 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T11:50:01.1906995Z dist init r=2, world=4 2025-12-04T11:50:01.1907201Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1907569Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1908051Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1908525Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1909002Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1909447Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1909920Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1910386Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1910849Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1911307Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1911769Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1912220Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1912674Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1913140Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1913780Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 2317352960 and is now 4055891968. 2025-12-04T11:50:01.1914353Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1914704Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1915247Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1915711Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1916074Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1916518Z [rank1]:E1204 11:49:28.388000 222676 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:50:01.1916760Z dist init r=1, world=4 2025-12-04T11:50:01.1916963Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1917299Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1917780Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1918255Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1918732Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1919178Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1919617Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1920114Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1920577Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1921041Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1921505Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1921955Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1922407Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1922870Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1923519Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 2250244096 and is now 3988783104. 2025-12-04T11:50:01.1924097Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1924446Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1924989Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1925454Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1925845Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1926257Z [rank3]:E1204 11:49:28.391000 222678 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T11:50:01.1926497Z dist init r=3, world=4 2025-12-04T11:50:01.1926700Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1927038Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1927523Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1928001Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1928476Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1928922Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1929359Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1929871Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1930340Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1930801Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1931266Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1931716Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1932202Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1932672Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1933283Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 2453667840 and is now 4192206848. 2025-12-04T11:50:01.1933857Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1934210Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1934760Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1943285Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1943679Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1944112Z [rank0]:E1204 11:49:28.424000 222675 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:50:01.1944367Z dist init r=0, world=4 2025-12-04T11:50:01.1944791Z [rank0]:[W1204 11:49:28.163385867 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:50:01.1945220Z FAILED [10.0159s] [100%] 2025-12-04T11:50:01.1945292Z 2025-12-04T11:50:01.1945365Z =================================== FAILURES =================================== 2025-12-04T11:50:01.1945557Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________ 2025-12-04T11:50:01.1945731Z Traceback (most recent call last): 2025-12-04T11:50:01.1945988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:50:01.1946244Z self._join_processes(fn) 2025-12-04T11:50:01.1946498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:50:01.1946773Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:50:01.1947053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:50:01.1947326Z raise RuntimeError(error) 2025-12-04T11:50:01.1947491Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:50:01.1947665Z Traceback (most recent call last): 2025-12-04T11:50:01.1947918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1948165Z getattr(self, test_name)() 2025-12-04T11:50:01.1948400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1948635Z fn() 2025-12-04T11:50:01.1948841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1949078Z method(*args, **kwargs) 2025-12-04T11:50:01.1949374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1949614Z method(*args, **kwargs) 2025-12-04T11:50:01.1949881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1950112Z with policy(): 2025-12-04T11:50:01.1950329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1950572Z raise RuntimeError(msg) 2025-12-04T11:50:01.1950953Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 2317352960 and is now 4055891968. 2025-12-04T11:50:01.1951293Z 2025-12-04T11:50:01.1951373Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1951687Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1951948Z 2025-12-04T11:50:01.1952038Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1952166Z 2025-12-04T11:50:01.1952168Z 2025-12-04T11:50:01.1952248Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:50:01.1952454Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:50:01.1952828Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-4413e31fe02281ab.xml - 2025-12-04T11:50:01.1953174Z =========================== short test summary info ============================ 2025-12-04T11:50:01.1953487Z FAILED [10.0159s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:50:01.1953780Z Traceback (most recent call last): 2025-12-04T11:50:01.1954031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1954281Z getattr(self, test_name)() 2025-12-04T11:50:01.1954518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1954753Z fn() 2025-12-04T11:50:01.1954960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1955194Z method(*args, **kwargs) 2025-12-04T11:50:01.1955418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1955653Z method(*args, **kwargs) 2025-12-04T11:50:01.1955878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1956115Z with policy(): 2025-12-04T11:50:01.1956338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1956581Z raise RuntimeError(msg) 2025-12-04T11:50:01.1956961Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 2317352960 and is now 4055891968. 2025-12-04T11:50:01.1957306Z 2025-12-04T11:50:01.1957385Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1957690Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T11:50:01.1957917Z 2025-12-04T11:50:01.1958008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1958234Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:50:01.1958413Z ======================= 1 failed, 1 deselected in 10.17s ======================= 2025-12-04T11:50:01.1958564Z Got exit code 1 2025-12-04T11:50:01.1958761Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda 2025-12-04T11:50:01.1959071Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:50:01.1959444Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-c7590130d10f5b89.xml 2025-12-04T11:50:01.1959791Z ============================= test session starts ============================== 2025-12-04T11:50:01.1960009Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:50:01.1960206Z cachedir: .pytest_cache 2025-12-04T11:50:01.1960442Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:50:01.1960728Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:50:01.1960856Z configfile: pytest.ini 2025-12-04T11:50:01.1961089Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:50:01.1961369Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T11:50:01.1961541Z stepcurrent: skipping 1 already run items. 2025-12-04T11:50:01.1961677Z Running 1 items in this shard 2025-12-04T11:50:01.1961749Z 2025-12-04T11:50:01.1962034Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 11:49:32.739000 223008 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 223077 2025-12-04T11:50:01.1962506Z I1204 11:49:32.740000 223008 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 223078 2025-12-04T11:50:01.1962862Z I1204 11:49:32.740000 223008 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 223079 2025-12-04T11:50:01.1963212Z I1204 11:49:32.741000 223008 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 223080 2025-12-04T11:50:01.1963706Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T11:50:01.1964093Z return func(*args, **kwargs) 2025-12-04T11:50:01.1964321Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1964673Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1965174Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1965667Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1966157Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1966614Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1967065Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1967569Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1968045Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1968515Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1968992Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1969451Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1969952Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1970457Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1971093Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.1971684Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1972039Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1972604Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.1973088Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1973463Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1973882Z [rank0]:E1204 11:49:38.126000 223077 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:50:01.1974131Z dist init r=0, world=4 2025-12-04T11:50:01.1974339Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1974682Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1975174Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1975661Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1976151Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1976607Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1977072Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1977544Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1978012Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1978481Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1978951Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1979428Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1979928Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1980396Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1981028Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 2250244096 and is now 3319791616. 2025-12-04T11:50:01.1981622Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1981976Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1982534Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.1983015Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1983382Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1983798Z [rank3]:E1204 11:49:38.128000 223080 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T11:50:01.1984046Z dist init r=3, world=4 2025-12-04T11:50:01.1984257Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1984600Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1985083Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1985564Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1986070Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1986523Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1986961Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1987425Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1987886Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1988348Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1988839Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1989288Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1989776Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.1990244Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.1990872Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 2317352960 and is now 3386900480. 2025-12-04T11:50:01.1991459Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1991808Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.1992359Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.1992830Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.1993197Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.1993615Z [rank1]:E1204 11:49:38.138000 223078 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:50:01.1993858Z dist init r=1, world=4 2025-12-04T11:50:01.1994064Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.1994402Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.1994887Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.1995398Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.1995875Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.1996317Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.1996761Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1997221Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1997682Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.1998172Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.1998636Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.1999083Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.1999535Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2000046Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2000670Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 2300575744 and is now 3370123264. 2025-12-04T11:50:01.2001253Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2001599Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2002152Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2002623Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2002984Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2003394Z [rank2]:E1204 11:49:38.158000 223079 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T11:50:01.2003634Z dist init r=2, world=4 2025-12-04T11:50:01.2004034Z [rank0]:[W1204 11:49:38.783650497 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:50:01.2004441Z FAILED [6.9127s] [100%] 2025-12-04T11:50:01.2004545Z 2025-12-04T11:50:01.2004602Z =================================== FAILURES =================================== 2025-12-04T11:50:01.2004786Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________ 2025-12-04T11:50:01.2004953Z Traceback (most recent call last): 2025-12-04T11:50:01.2005196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:50:01.2005438Z self._join_processes(fn) 2025-12-04T11:50:01.2005682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:50:01.2005944Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:50:01.2006211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:50:01.2006469Z raise RuntimeError(error) 2025-12-04T11:50:01.2006622Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.2006809Z Traceback (most recent call last): 2025-12-04T11:50:01.2007046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2007286Z getattr(self, test_name)() 2025-12-04T11:50:01.2007514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2007744Z fn() 2025-12-04T11:50:01.2007945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2008174Z method(*args, **kwargs) 2025-12-04T11:50:01.2008394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2008620Z method(*args, **kwargs) 2025-12-04T11:50:01.2008841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2009067Z with policy(): 2025-12-04T11:50:01.2009277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2009504Z raise RuntimeError(msg) 2025-12-04T11:50:01.2009926Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2010274Z 2025-12-04T11:50:01.2010349Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2010654Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2010884Z 2025-12-04T11:50:01.2010975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2011100Z 2025-12-04T11:50:01.2011102Z 2025-12-04T11:50:01.2011180Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:50:01.2011380Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:50:01.2011745Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-c7590130d10f5b89.xml - 2025-12-04T11:50:01.2012081Z =========================== short test summary info ============================ 2025-12-04T11:50:01.2012390Z FAILED [6.9127s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.2012681Z Traceback (most recent call last): 2025-12-04T11:50:01.2012953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2013197Z getattr(self, test_name)() 2025-12-04T11:50:01.2013429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2013659Z fn() 2025-12-04T11:50:01.2013860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2014088Z method(*args, **kwargs) 2025-12-04T11:50:01.2014304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2014530Z method(*args, **kwargs) 2025-12-04T11:50:01.2014747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2014970Z with policy(): 2025-12-04T11:50:01.2015180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2015439Z raise RuntimeError(msg) 2025-12-04T11:50:01.2015817Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2016163Z 2025-12-04T11:50:01.2016237Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2016539Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2016771Z 2025-12-04T11:50:01.2016859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2017045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:50:01.2017211Z ======================= 1 failed, 1 deselected in 7.06s ======================== 2025-12-04T11:50:01.2017350Z Got exit code 1 2025-12-04T11:50:01.2017447Z Retrying single test... 2025-12-04T11:50:01.2017712Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-18ed10254416d539.xml 2025-12-04T11:50:01.2018003Z ============================= test session starts ============================== 2025-12-04T11:50:01.2018212Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:50:01.2018398Z cachedir: .pytest_cache 2025-12-04T11:50:01.2018619Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:50:01.2018857Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:50:01.2018974Z configfile: pytest.ini 2025-12-04T11:50:01.2019201Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:50:01.2019469Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T11:50:01.2019810Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T11:50:01.2020074Z Running 1 items in this shard 2025-12-04T11:50:01.2020148Z 2025-12-04T11:50:01.2020420Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 11:49:42.013000 223410 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 223479 2025-12-04T11:50:01.2020880Z I1204 11:49:42.014000 223410 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 223480 2025-12-04T11:50:01.2021389Z I1204 11:49:42.015000 223410 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 223481 2025-12-04T11:50:01.2021759Z I1204 11:49:42.015000 223410 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 223482 2025-12-04T11:50:01.2022235Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T11:50:01.2022602Z return func(*args, **kwargs) 2025-12-04T11:50:01.2022814Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2023154Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2023641Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2024119Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2024624Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2025068Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2025504Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2025965Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2026428Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2026890Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2027350Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2027797Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2028248Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2028713Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2029344Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 2300575744 and is now 3370123264. 2025-12-04T11:50:01.2029968Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2030314Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2030899Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2031372Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2031732Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2032142Z [rank2]:E1204 11:49:47.471000 223481 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T11:50:01.2032482Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2032815Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2033297Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2033802Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2034275Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2034720Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2035153Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2035613Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2036075Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2036537Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2036995Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2037441Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2037892Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2038354Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2038980Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 2317352960 and is now 3386900480. 2025-12-04T11:50:01.2039562Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2039983Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2040533Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2041002Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2041363Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2041772Z [rank1]:E1204 11:49:47.471000 223480 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:50:01.2042012Z dist init r=2, world=4 2025-12-04T11:50:01.2042113Z dist init r=1, world=4 2025-12-04T11:50:01.2042313Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2042680Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2043159Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2043635Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2044108Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2044555Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2044990Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2045452Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2045909Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2046366Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2046825Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2047274Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2047723Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2048185Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2048830Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 2250244096 and is now 3319791616. 2025-12-04T11:50:01.2049413Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2049800Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2050349Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2050815Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2051174Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2051585Z [rank3]:E1204 11:49:47.473000 223482 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T11:50:01.2051853Z dist init r=3, world=4 2025-12-04T11:50:01.2052055Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2052389Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2052870Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2053345Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2053824Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2054270Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2054705Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2055163Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2055621Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2056081Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2056545Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2056991Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2057441Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2057902Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2058553Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2059139Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2059488Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2060080Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2060550Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2060952Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2061362Z [rank0]:E1204 11:49:47.477000 223479 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:50:01.2061603Z dist init r=0, world=4 2025-12-04T11:50:01.2061998Z [rank0]:[W1204 11:49:47.157425903 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:50:01.2062403Z FAILED [7.1130s] [100%] 2025-12-04T11:50:01.2062467Z 2025-12-04T11:50:01.2062524Z =================================== FAILURES =================================== 2025-12-04T11:50:01.2062710Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________ 2025-12-04T11:50:01.2062880Z Traceback (most recent call last): 2025-12-04T11:50:01.2063124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:50:01.2063366Z self._join_processes(fn) 2025-12-04T11:50:01.2063614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:50:01.2063875Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:50:01.2064140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:50:01.2064397Z raise RuntimeError(error) 2025-12-04T11:50:01.2064547Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.2064706Z Traceback (most recent call last): 2025-12-04T11:50:01.2064945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2065189Z getattr(self, test_name)() 2025-12-04T11:50:01.2065416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2065647Z fn() 2025-12-04T11:50:01.2065845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2066074Z method(*args, **kwargs) 2025-12-04T11:50:01.2066292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2066518Z method(*args, **kwargs) 2025-12-04T11:50:01.2066735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2066961Z with policy(): 2025-12-04T11:50:01.2067205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2067439Z raise RuntimeError(msg) 2025-12-04T11:50:01.2067819Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2068165Z 2025-12-04T11:50:01.2068241Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2068547Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2068778Z 2025-12-04T11:50:01.2068867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2068989Z 2025-12-04T11:50:01.2069049Z Process 1 exited with error code 10 and exception: 2025-12-04T11:50:01.2069194Z Traceback (most recent call last): 2025-12-04T11:50:01.2069458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2069732Z getattr(self, test_name)() 2025-12-04T11:50:01.2069961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2070189Z fn() 2025-12-04T11:50:01.2070387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2070614Z method(*args, **kwargs) 2025-12-04T11:50:01.2070830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2071056Z method(*args, **kwargs) 2025-12-04T11:50:01.2071275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2071498Z with policy(): 2025-12-04T11:50:01.2071709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2071942Z raise RuntimeError(msg) 2025-12-04T11:50:01.2072320Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 2317352960 and is now 3386900480. 2025-12-04T11:50:01.2072702Z 2025-12-04T11:50:01.2072778Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2073080Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2073311Z 2025-12-04T11:50:01.2073398Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2073523Z 2025-12-04T11:50:01.2073580Z Process 2 exited with error code 10 and exception: 2025-12-04T11:50:01.2073719Z Traceback (most recent call last): 2025-12-04T11:50:01.2073959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2074201Z getattr(self, test_name)() 2025-12-04T11:50:01.2074429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2074657Z fn() 2025-12-04T11:50:01.2074857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2075083Z method(*args, **kwargs) 2025-12-04T11:50:01.2075298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2075524Z method(*args, **kwargs) 2025-12-04T11:50:01.2075770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2075997Z with policy(): 2025-12-04T11:50:01.2076204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2076432Z raise RuntimeError(msg) 2025-12-04T11:50:01.2076809Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 2300575744 and is now 3370123264. 2025-12-04T11:50:01.2077154Z 2025-12-04T11:50:01.2077228Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2077531Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2077762Z 2025-12-04T11:50:01.2077850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2078006Z 2025-12-04T11:50:01.2078063Z Process 3 exited with error code 10 and exception: 2025-12-04T11:50:01.2078201Z Traceback (most recent call last): 2025-12-04T11:50:01.2078440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2078679Z getattr(self, test_name)() 2025-12-04T11:50:01.2078909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2079138Z fn() 2025-12-04T11:50:01.2079335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2079561Z method(*args, **kwargs) 2025-12-04T11:50:01.2079814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2080041Z method(*args, **kwargs) 2025-12-04T11:50:01.2080256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2080479Z with policy(): 2025-12-04T11:50:01.2080686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2080914Z raise RuntimeError(msg) 2025-12-04T11:50:01.2081288Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 2250244096 and is now 3319791616. 2025-12-04T11:50:01.2081630Z 2025-12-04T11:50:01.2081703Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2082006Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2082237Z 2025-12-04T11:50:01.2082326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2082447Z 2025-12-04T11:50:01.2082449Z 2025-12-04T11:50:01.2082526Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:50:01.2082727Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:50:01.2083091Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-18ed10254416d539.xml - 2025-12-04T11:50:01.2083509Z =========================== short test summary info ============================ 2025-12-04T11:50:01.2083819Z FAILED [7.1130s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.2084144Z Traceback (most recent call last): 2025-12-04T11:50:01.2084389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2084628Z getattr(self, test_name)() 2025-12-04T11:50:01.2084858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2085088Z fn() 2025-12-04T11:50:01.2085287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2085515Z method(*args, **kwargs) 2025-12-04T11:50:01.2085732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2085959Z method(*args, **kwargs) 2025-12-04T11:50:01.2086177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2086402Z with policy(): 2025-12-04T11:50:01.2086642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2086871Z raise RuntimeError(msg) 2025-12-04T11:50:01.2087251Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2087594Z 2025-12-04T11:50:01.2087669Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2087969Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2088199Z 2025-12-04T11:50:01.2088289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2088413Z 2025-12-04T11:50:01.2088474Z Process 1 exited with error code 10 and exception: 2025-12-04T11:50:01.2088617Z Traceback (most recent call last): 2025-12-04T11:50:01.2088856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2089096Z getattr(self, test_name)() 2025-12-04T11:50:01.2089324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2089555Z fn() 2025-12-04T11:50:01.2089799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2090025Z method(*args, **kwargs) 2025-12-04T11:50:01.2090240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2090466Z method(*args, **kwargs) 2025-12-04T11:50:01.2090683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2090908Z with policy(): 2025-12-04T11:50:01.2091118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2091349Z raise RuntimeError(msg) 2025-12-04T11:50:01.2091725Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 2317352960 and is now 3386900480. 2025-12-04T11:50:01.2092071Z 2025-12-04T11:50:01.2092144Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2092449Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2092680Z 2025-12-04T11:50:01.2092799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2092925Z 2025-12-04T11:50:01.2092982Z Process 2 exited with error code 10 and exception: 2025-12-04T11:50:01.2093120Z Traceback (most recent call last): 2025-12-04T11:50:01.2093357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2093596Z getattr(self, test_name)() 2025-12-04T11:50:01.2093824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2094052Z fn() 2025-12-04T11:50:01.2094251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2094477Z method(*args, **kwargs) 2025-12-04T11:50:01.2094695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2094957Z method(*args, **kwargs) 2025-12-04T11:50:01.2095171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2095395Z with policy(): 2025-12-04T11:50:01.2095602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2095831Z raise RuntimeError(msg) 2025-12-04T11:50:01.2096208Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 2300575744 and is now 3370123264. 2025-12-04T11:50:01.2096556Z 2025-12-04T11:50:01.2096629Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2096932Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2097165Z 2025-12-04T11:50:01.2097252Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2097375Z 2025-12-04T11:50:01.2097433Z Process 3 exited with error code 10 and exception: 2025-12-04T11:50:01.2097572Z Traceback (most recent call last): 2025-12-04T11:50:01.2097810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2098050Z getattr(self, test_name)() 2025-12-04T11:50:01.2098280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2098510Z fn() 2025-12-04T11:50:01.2098707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2098932Z method(*args, **kwargs) 2025-12-04T11:50:01.2099149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2099376Z method(*args, **kwargs) 2025-12-04T11:50:01.2099590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2099856Z with policy(): 2025-12-04T11:50:01.2100063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2100292Z raise RuntimeError(msg) 2025-12-04T11:50:01.2100666Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 2250244096 and is now 3319791616. 2025-12-04T11:50:01.2101013Z 2025-12-04T11:50:01.2101118Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2101419Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2101647Z 2025-12-04T11:50:01.2101736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2101922Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:50:01.2102087Z ======================= 1 failed, 1 deselected in 7.27s ======================== 2025-12-04T11:50:01.2102226Z Got exit code 1 2025-12-04T11:50:01.2102325Z Retrying single test... 2025-12-04T11:50:01.2102592Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-06a4867daa702f85.xml 2025-12-04T11:50:01.2102889Z ============================= test session starts ============================== 2025-12-04T11:50:01.2103102Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:50:01.2103321Z cachedir: .pytest_cache 2025-12-04T11:50:01.2103545Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:50:01.2103780Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:50:01.2103900Z configfile: pytest.ini 2025-12-04T11:50:01.2104127Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:50:01.2104396Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T11:50:01.2104691Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T11:50:01.2104957Z Running 1 items in this shard 2025-12-04T11:50:01.2105030Z 2025-12-04T11:50:01.2105309Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 11:49:51.358000 223812 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 223881 2025-12-04T11:50:01.2105777Z I1204 11:49:51.359000 223812 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 223882 2025-12-04T11:50:01.2106120Z I1204 11:49:51.359000 223812 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 223883 2025-12-04T11:50:01.2106459Z I1204 11:49:51.360000 223812 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 223884 2025-12-04T11:50:01.2106935Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T11:50:01.2107303Z return func(*args, **kwargs) 2025-12-04T11:50:01.2107522Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2107865Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2108352Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2108831Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2109310Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2109798Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2110268Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2110732Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2111197Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2111661Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2112128Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2112609Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2113062Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2113525Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2114150Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 2317352960 and is now 3386900480. 2025-12-04T11:50:01.2114740Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2115090Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2115641Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2116114Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2116478Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2116896Z [rank1]:E1204 11:49:56.772000 223882 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:50:01.2117146Z dist init r=1, world=4 2025-12-04T11:50:01.2117349Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2117686Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2118168Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2118642Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2119144Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2119594Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2120076Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2120540Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2121004Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2121464Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2121955Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2122406Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2122855Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2123317Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2123944Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 2250244096 and is now 3319791616. 2025-12-04T11:50:01.2124532Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2124728Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2125048Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2125162Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2125379Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2125545Z [rank3]:E1204 11:49:56.774000 223884 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T11:50:01.2125590Z dist init r=3, world=4 2025-12-04T11:50:01.2125730Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2125890Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2126179Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2126365Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2126652Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2126777Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2127054Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2127201Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2127480Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2127653Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2127927Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2128065Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2128342Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2128494Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2128930Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2129047Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2129242Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2129562Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2129678Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2129924Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2130090Z [rank0]:E1204 11:49:56.786000 223881 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:50:01.2130129Z dist init r=0, world=4 2025-12-04T11:50:01.2130270Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:50:01.2130430Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:50:01.2130743Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2130901Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:50:01.2131184Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2131310Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:50:01.2131589Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2131767Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2132042Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2132191Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:50:01.2132467Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2132603Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:50:01.2132885Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2133034Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:50:01.2133469Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 2300575744 and is now 3370123264. 2025-12-04T11:50:01.2133585Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2133781Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2134103Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2134216Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:50:01.2134428Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2134591Z [rank2]:E1204 11:49:56.829000 223883 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T11:50:01.2134634Z dist init r=2, world=4 2025-12-04T11:50:01.2134992Z [rank0]:[W1204 11:49:57.487584431 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:50:01.2135035Z FAILED [7.1128s] [100%] 2025-12-04T11:50:01.2135038Z 2025-12-04T11:50:01.2135098Z =================================== FAILURES =================================== 2025-12-04T11:50:01.2135186Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________ 2025-12-04T11:50:01.2135235Z Traceback (most recent call last): 2025-12-04T11:50:01.2135397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:50:01.2135445Z self._join_processes(fn) 2025-12-04T11:50:01.2135618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:50:01.2135678Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:50:01.2135877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:50:01.2135923Z raise RuntimeError(error) 2025-12-04T11:50:01.2136004Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.2136052Z Traceback (most recent call last): 2025-12-04T11:50:01.2136214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2136258Z getattr(self, test_name)() 2025-12-04T11:50:01.2136416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2136453Z fn() 2025-12-04T11:50:01.2136604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2136650Z method(*args, **kwargs) 2025-12-04T11:50:01.2136802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2136845Z method(*args, **kwargs) 2025-12-04T11:50:01.2136997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2137038Z with policy(): 2025-12-04T11:50:01.2137190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2137235Z raise RuntimeError(msg) 2025-12-04T11:50:01.2137546Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2137550Z 2025-12-04T11:50:01.2137628Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2137825Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2137827Z 2025-12-04T11:50:01.2137915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2137918Z 2025-12-04T11:50:01.2137979Z Process 1 exited with error code 10 and exception: 2025-12-04T11:50:01.2138025Z Traceback (most recent call last): 2025-12-04T11:50:01.2138188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2138233Z getattr(self, test_name)() 2025-12-04T11:50:01.2138393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2138430Z fn() 2025-12-04T11:50:01.2138602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2138644Z method(*args, **kwargs) 2025-12-04T11:50:01.2138796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2138836Z method(*args, **kwargs) 2025-12-04T11:50:01.2138987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2139025Z with policy(): 2025-12-04T11:50:01.2139178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2139220Z raise RuntimeError(msg) 2025-12-04T11:50:01.2139535Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 2317352960 and is now 3386900480. 2025-12-04T11:50:01.2139560Z 2025-12-04T11:50:01.2139638Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2139862Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2139864Z 2025-12-04T11:50:01.2139955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2139957Z 2025-12-04T11:50:01.2140016Z Process 3 exited with error code 10 and exception: 2025-12-04T11:50:01.2140064Z Traceback (most recent call last): 2025-12-04T11:50:01.2140225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2140270Z getattr(self, test_name)() 2025-12-04T11:50:01.2140432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2140470Z fn() 2025-12-04T11:50:01.2140622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2140664Z method(*args, **kwargs) 2025-12-04T11:50:01.2140815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2140857Z method(*args, **kwargs) 2025-12-04T11:50:01.2141008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2141048Z with policy(): 2025-12-04T11:50:01.2141201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2141242Z raise RuntimeError(msg) 2025-12-04T11:50:01.2141555Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 2250244096 and is now 3319791616. 2025-12-04T11:50:01.2141559Z 2025-12-04T11:50:01.2141634Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2141827Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2141829Z 2025-12-04T11:50:01.2141915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2141917Z 2025-12-04T11:50:01.2141919Z 2025-12-04T11:50:01.2141996Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:50:01.2142085Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:50:01.2142364Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-06a4867daa702f85.xml - 2025-12-04T11:50:01.2142431Z =========================== short test summary info ============================ 2025-12-04T11:50:01.2142642Z FAILED [7.1128s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:50:01.2142690Z Traceback (most recent call last): 2025-12-04T11:50:01.2142854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2142899Z getattr(self, test_name)() 2025-12-04T11:50:01.2143059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2143096Z fn() 2025-12-04T11:50:01.2143247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2143290Z method(*args, **kwargs) 2025-12-04T11:50:01.2143471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2143513Z method(*args, **kwargs) 2025-12-04T11:50:01.2143663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2143701Z with policy(): 2025-12-04T11:50:01.2143853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2143896Z raise RuntimeError(msg) 2025-12-04T11:50:01.2144205Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 2453667840 and is now 3523215360. 2025-12-04T11:50:01.2144207Z 2025-12-04T11:50:01.2144285Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2144480Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2144482Z 2025-12-04T11:50:01.2144569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2144571Z 2025-12-04T11:50:01.2144632Z Process 1 exited with error code 10 and exception: 2025-12-04T11:50:01.2144678Z Traceback (most recent call last): 2025-12-04T11:50:01.2144842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2144884Z getattr(self, test_name)() 2025-12-04T11:50:01.2145044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2145078Z fn() 2025-12-04T11:50:01.2145232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2145274Z method(*args, **kwargs) 2025-12-04T11:50:01.2145425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2145467Z method(*args, **kwargs) 2025-12-04T11:50:01.2145618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2145655Z with policy(): 2025-12-04T11:50:01.2145808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2145850Z raise RuntimeError(msg) 2025-12-04T11:50:01.2146178Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 2317352960 and is now 3386900480. 2025-12-04T11:50:01.2146182Z 2025-12-04T11:50:01.2146259Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2146453Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2146455Z 2025-12-04T11:50:01.2146544Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2146546Z 2025-12-04T11:50:01.2146606Z Process 3 exited with error code 10 and exception: 2025-12-04T11:50:01.2146653Z Traceback (most recent call last): 2025-12-04T11:50:01.2146815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:50:01.2146860Z getattr(self, test_name)() 2025-12-04T11:50:01.2147020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:50:01.2147056Z fn() 2025-12-04T11:50:01.2147240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2147283Z method(*args, **kwargs) 2025-12-04T11:50:01.2147433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:50:01.2147476Z method(*args, **kwargs) 2025-12-04T11:50:01.2147625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:50:01.2147664Z with policy(): 2025-12-04T11:50:01.2147816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:50:01.2147859Z raise RuntimeError(msg) 2025-12-04T11:50:01.2148169Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 2250244096 and is now 3319791616. 2025-12-04T11:50:01.2148173Z 2025-12-04T11:50:01.2148247Z To execute this test, run the following from the base repo dir: 2025-12-04T11:50:01.2148440Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T11:50:01.2148442Z 2025-12-04T11:50:01.2148530Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:50:01.2148596Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:50:01.2148659Z ======================= 1 failed, 1 deselected in 7.27s ======================== 2025-12-04T11:50:01.2148699Z Got exit code 1 2025-12-04T11:50:01.2148845Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T11:50:01.2148976Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:50:01.2149177Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9919e8b2203d8ff4.xml 2025-12-04T11:50:01.2149239Z ============================= test session starts ============================== 2025-12-04T11:50:01.2149350Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:50:01.2149394Z cachedir: .pytest_cache 2025-12-04T11:50:01.2149553Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:50:01.2149602Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:50:01.2149643Z configfile: pytest.ini 2025-12-04T11:50:01.2149838Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:50:01.2149942Z collecting ... collected 2 items / 2 deselected / 0 selected 2025-12-04T11:50:01.2149998Z stepcurrent: skipping 2 already run items. 2025-12-04T11:50:01.2150047Z Running 0 items in this shard 2025-12-04T11:50:01.2150049Z 2025-12-04T11:50:01.2150292Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9919e8b2203d8ff4.xml - 2025-12-04T11:50:01.2150354Z ============================ 2 deselected in 0.00s ============================= 2025-12-04T11:50:01.2150644Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda', 'test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda'] 2025-12-04T11:50:01.2150647Z 2025-12-04T11:50:01.2150843Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 1/1 (test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_409903cf23c40aa5_.log) 2025-12-04T11:50:01.2150845Z 2025-12-04T11:50:01.2151420Z Finished distributed/fsdp/test_fsdp_pure_fp16 1/1 ... [2025-12-04 11:50:01.174575][2234143.649601195], took 1.13min 2025-12-04T11:50:01.2151691Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:01.2151780Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:01.2151876Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T11:50:01.2151927Z Uploading artifacts took 0.00 seconds 2025-12-04T11:50:01.2151987Z distributed/fsdp/test_fsdp_pure_fp16 1/1 failed! 2025-12-04T11:50:01.2152114Z Running distributed/checkpoint/test_checkpoint 1/1 ... [2025-12-04 11:50:01.177235][2234143.652265227] 2025-12-04T11:50:01.2152163Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:01.2152494Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:01.177396] 2025-12-04T11:50:34.6941356Z 2025-12-04T11:50:34.6945346Z distributed/checkpoint/test_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_checkpoint_1.1_a9891d14294497a7_.log 2025-12-04T11:50:34.6948743Z Running 8 items in this shard: test/distributed/checkpoint/test_checkpoint.py::TestDistributedCheckpointing::test_default_metadata, test/distributed/checkpoint/test_checkpoint.py::TestDistributedCheckpointing::test_tensor_metadata_with_missing_rank_spec, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_dummy_reader_works, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_dummy_writer_works, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_load_error_handling, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_load_error_handling_no_dist, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_save_error_handling, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_save_error_handling_no_dist 2025-12-04T11:50:34.6951562Z 2025-12-04T11:50:34.6951873Z Finished distributed/checkpoint/test_checkpoint 1/1 ... [2025-12-04 11:50:34.693871][2234177.168896339], took 0.56min 2025-12-04T11:50:34.6953202Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:34.6969169Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:34.6971965Z Running distributed/_pycute/test_coalesce 1/1 ... [2025-12-04 11:50:34.697106][2234177.172135358] 2025-12-04T11:50:34.6972655Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:34.6974045Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_pycute/test_coalesce.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:34.697280] 2025-12-04T11:50:36.8654164Z 2025-12-04T11:50:36.8655003Z distributed/_pycute/test_coalesce 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_coalesce_1.1_eb6f455e54ba9b04_.log 2025-12-04T11:50:36.8655859Z Running 1 items in this shard: test/distributed/_pycute/test_coalesce.py::TestCoalesce::test_coalesce 2025-12-04T11:50:36.8656190Z 2025-12-04T11:50:36.8656454Z Finished distributed/_pycute/test_coalesce 1/1 ... [2025-12-04 11:50:36.865166][2234179.340190861], took 0.04min 2025-12-04T11:50:36.8669287Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:36.8685600Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:36.8687819Z Running distributed/_pycute/test_complement 1/1 ... [2025-12-04 11:50:36.868662][2234179.343691494] 2025-12-04T11:50:36.8688176Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:36.8689464Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_pycute/test_complement.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:36.868825] 2025-12-04T11:50:38.9871522Z 2025-12-04T11:50:38.9872907Z distributed/_pycute/test_complement 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_complement_1.1_10031e63c54af11b_.log 2025-12-04T11:50:38.9874083Z Running 1 items in this shard: test/distributed/_pycute/test_complement.py::TestComplement::test_complement 2025-12-04T11:50:38.9874523Z 2025-12-04T11:50:38.9874857Z Finished distributed/_pycute/test_complement 1/1 ... [2025-12-04 11:50:38.986748][2234181.461773559], took 0.04min 2025-12-04T11:50:38.9885791Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:38.9901726Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:38.9904242Z Running distributed/_pycute/test_composition 1/1 ... [2025-12-04 11:50:38.990238][2234181.465268332] 2025-12-04T11:50:38.9904623Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:38.9905382Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_pycute/test_composition.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:38.990404] 2025-12-04T11:50:41.0084280Z 2025-12-04T11:50:41.0085842Z distributed/_pycute/test_composition 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_composition_1.1_b283c213fd01570c_.log 2025-12-04T11:50:41.0087125Z Running 1 items in this shard: test/distributed/_pycute/test_composition.py::TestComposition::test_composition 2025-12-04T11:50:41.0087626Z 2025-12-04T11:50:41.0088025Z Finished distributed/_pycute/test_composition 1/1 ... [2025-12-04 11:50:41.008072][2234183.483097707], took 0.03min 2025-12-04T11:50:41.0097175Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:41.0112852Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:41.0115812Z Running distributed/_pycute/test_int_tuple 1/1 ... [2025-12-04 11:50:41.011392][2234183.486422064] 2025-12-04T11:50:41.0116232Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:41.0116974Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_pycute/test_int_tuple.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:41.011558] 2025-12-04T11:50:43.0294340Z 2025-12-04T11:50:43.0295478Z distributed/_pycute/test_int_tuple 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_int_tuple_1.1_18bee4af7cb66692_.log 2025-12-04T11:50:43.0298952Z Running 12 items in this shard: test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_basic, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_idx2crd_roundtrip, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_int_with_tuple_shape, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_none, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_tuple, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_basic, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_crd2idx_roundtrip, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_tuple, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_inner_product, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_product, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_shape_div, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_suffix_product 2025-12-04T11:50:43.0302441Z 2025-12-04T11:50:43.0302734Z Finished distributed/_pycute/test_int_tuple 1/1 ... [2025-12-04 11:50:43.029177][2234185.504203381], took 0.03min 2025-12-04T11:50:43.0308664Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:43.0325070Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:43.0328117Z Running distributed/_pycute/test_left_inverse 1/1 ... [2025-12-04 11:50:43.032654][2234185.507684335] 2025-12-04T11:50:43.0328670Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:43.0329677Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_pycute/test_left_inverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:43.032817] 2025-12-04T11:50:45.0502846Z 2025-12-04T11:50:45.0504070Z distributed/_pycute/test_left_inverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_left_inverse_1.1_d89177bf72e1560a_.log 2025-12-04T11:50:45.0505017Z Running 1 items in this shard: test/distributed/_pycute/test_left_inverse.py::TestLeftInverse::test_left_inverse 2025-12-04T11:50:45.0505378Z 2025-12-04T11:50:45.0505719Z Finished distributed/_pycute/test_left_inverse 1/1 ... [2025-12-04 11:50:45.050014][2234187.525039031], took 0.03min 2025-12-04T11:50:45.0517026Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:45.0532683Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:45.0535197Z Running distributed/_pycute/test_right_inverse 1/1 ... [2025-12-04 11:50:45.053359][2234187.528389118] 2025-12-04T11:50:45.0535537Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:45.0537026Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_pycute/test_right_inverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:45.053534] 2025-12-04T11:50:47.0713102Z 2025-12-04T11:50:47.0714516Z distributed/_pycute/test_right_inverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_right_inverse_1.1_d83ac7eeb5e1d7f0_.log 2025-12-04T11:50:47.0715810Z Running 1 items in this shard: test/distributed/_pycute/test_right_inverse.py::TestRightInverse::test_right_inverse 2025-12-04T11:50:47.0716322Z 2025-12-04T11:50:47.0716715Z Finished distributed/_pycute/test_right_inverse 1/1 ... [2025-12-04 11:50:47.071005][2234189.546030649], took 0.03min 2025-12-04T11:50:47.0728036Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:50:47.0744020Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:47.0746589Z Running distributed/tensor/debug/test_debug_mode 1/1 ... [2025-12-04 11:50:47.074460][2234189.549490013] 2025-12-04T11:50:47.0747691Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:47.0748633Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/debug/test_debug_mode.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:47.074624] 2025-12-04T11:51:22.5957822Z 2025-12-04T11:51:22.5959273Z distributed/tensor/debug/test_debug_mode 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.debug.test_debug_mode_1.1_2c8b404a48f3dc67_.log 2025-12-04T11:51:22.5966354Z Running 25 items in this shard: test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_hash_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_structure_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_triton_hash_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_compile, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_backward, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_densor_redistribution_trace, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_einsum, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_higher_order_cond, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_mm, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_string_inside_context, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_fake_tensor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_False_has_outer_mode_False, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_False_has_outer_mode_True, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_True_has_outer_mode_False, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_True_has_outer_mode_True, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nn_module, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_pretty_print_dtensor_make_fx, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_real_tensor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_tensor_attributes, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_tensor_hash_redistribute, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_triton_kernel_logs, test/distributed/tensor/debug/test_debug_mode.py::TestDebugModeUtils::test_hash_empty_tenor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_base, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_base_async_op, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_functional_with_async_collective_tensor 2025-12-04T11:51:22.5970397Z 2025-12-04T11:51:22.5970549Z Finished distributed/tensor/debug/test_debug_mode 1/1 ... [2025-12-04 11:51:22.595442][2234225.070468791], took 0.59min 2025-12-04T11:51:22.5971012Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:51:22.5981738Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:51:22.5984107Z Running distributed/fsdp/test_fsdp_apply 1/1 ... [2025-12-04 11:51:22.598330][2234225.073360308] 2025-12-04T11:51:22.5984321Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:51:22.5986182Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_apply.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:51:22.598497] 2025-12-04T11:52:36.3593000Z 2025-12-04T11:52:36.3593976Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_apply 1/1 (test/test-reports/distributed.fsdp.test_fsdp_apply_1.1_40041d9465e03b91_.log) 2025-12-04T11:52:36.3594673Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-85ee52af0a9cd0d8.xml 2025-12-04T11:52:36.3595133Z ============================= test session starts ============================== 2025-12-04T11:52:36.3595500Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3595807Z cachedir: .pytest_cache 2025-12-04T11:52:36.3596247Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3596667Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3596904Z configfile: pytest.ini 2025-12-04T11:52:36.3597306Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3597720Z collecting ... collected 3 items 2025-12-04T11:52:36.3597962Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:52:36.3598833Z Running 3 items in this shard: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda, test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda, test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T11:52:36.3599536Z 2025-12-04T11:52:36.3600081Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda I1204 11:51:24.257000 232315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 232384 2025-12-04T11:52:36.3600852Z I1204 11:51:24.258000 232315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 232385 2025-12-04T11:52:36.3601709Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3602431Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3603779Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3604547Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3605136Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3605699Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3606438Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3607194Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3607377Z File "", line 1, in 2025-12-04T11:52:36.3607789Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main 2025-12-04T11:52:36.3608057Z exitcode = _main(fd, parent_sentinel) 2025-12-04T11:52:36.3608305Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/spawn.py", line 135, in _main 2025-12-04T11:52:36.3608559Z return self._bootstrap(parent_sentinel) 2025-12-04T11:52:36.3608827Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap 2025-12-04T11:52:36.3609081Z self.run() 2025-12-04T11:52:36.3609294Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/process.py", line 108, in run 2025-12-04T11:52:36.3609545Z self._target(*self._args, **self._kwargs) 2025-12-04T11:52:36.3609891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py", line 1272, in _run 2025-12-04T11:52:36.3610197Z self.run_test(test_name, pipe) 2025-12-04T11:52:36.3610521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3610845Z getattr(self, test_name)() 2025-12-04T11:52:36.3611389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3611695Z fn() 2025-12-04T11:52:36.3611986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3612269Z method(*args, **kwargs) 2025-12-04T11:52:36.3612501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3612743Z method(*args, **kwargs) 2025-12-04T11:52:36.3612973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3613217Z method(*args, **kwargs) 2025-12-04T11:52:36.3613476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test 2025-12-04T11:52:36.3613748Z result = test(self, **param_kwargs) 2025-12-04T11:52:36.3614003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 227, in wrapper 2025-12-04T11:52:36.3614260Z return func(*args, **kwargs) 2025-12-04T11:52:36.3614512Z File "/var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_apply.py", line 113, in test_apply_in_summon_raises_error 2025-12-04T11:52:36.3614780Z transformer.apply(self._init_linear_weights) 2025-12-04T11:52:36.3615064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 586, in apply 2025-12-04T11:52:36.3615337Z self._assert_state(TrainingState.IDLE) 2025-12-04T11:52:36.3615689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1028, in _assert_state 2025-12-04T11:52:36.3615971Z traceback.print_stack() 2025-12-04T11:52:36.3616204Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3616562Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3617069Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3617571Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3618071Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3618579Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3619040Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3619525Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3620048Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3620534Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3621019Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3621488Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3621959Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3622445Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3623105Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3623714Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3624070Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3624654Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3625188Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3625565Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3625990Z [rank0]:E1204 11:51:28.108000 232384 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3626340Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3626685Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3627178Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3627663Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3628193Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3628648Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3629098Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3629570Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3630089Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3630562Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3631038Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3631500Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3631968Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3632440Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3633087Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3633689Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3634046Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3634666Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3635157Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3635531Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3635952Z [rank1]:E1204 11:51:28.108000 232385 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3636204Z dist init r=0, world=2 2025-12-04T11:52:36.3636348Z Asserting FSDP instance is: FullyShardedDataParallel( 2025-12-04T11:52:36.3636526Z (_fsdp_wrapped_module): TransformerWithSharedParams( 2025-12-04T11:52:36.3636684Z (embed_tokens): Embedding(23, 16) 2025-12-04T11:52:36.3636819Z (transformer): Transformer( 2025-12-04T11:52:36.3636949Z (encoder): TransformerEncoder( 2025-12-04T11:52:36.3637125Z (layers): ModuleList( 2025-12-04T11:52:36.3637251Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T11:52:36.3637409Z (_fsdp_wrapped_module): TransformerEncoderLayer( 2025-12-04T11:52:36.3637566Z (self_attn): MultiheadAttention( 2025-12-04T11:52:36.3637773Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3637966Z ) 2025-12-04T11:52:36.3638100Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T11:52:36.3638275Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3638443Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T11:52:36.3638629Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3638814Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3638987Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3639140Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3639270Z ) 2025-12-04T11:52:36.3639364Z ) 2025-12-04T11:52:36.3639455Z ) 2025-12-04T11:52:36.3639581Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3639772Z ) 2025-12-04T11:52:36.3639874Z (decoder): TransformerDecoder( 2025-12-04T11:52:36.3640004Z (layers): ModuleList( 2025-12-04T11:52:36.3640133Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T11:52:36.3640287Z (_fsdp_wrapped_module): TransformerDecoderLayer( 2025-12-04T11:52:36.3640440Z (self_attn): MultiheadAttention( 2025-12-04T11:52:36.3640640Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3640830Z ) 2025-12-04T11:52:36.3640947Z (multihead_attn): MultiheadAttention( 2025-12-04T11:52:36.3641157Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3641349Z ) 2025-12-04T11:52:36.3641480Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T11:52:36.3641646Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3641810Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T11:52:36.3641992Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3642172Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3642349Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3642515Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3642662Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3642856Z (dropout3): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3642988Z ) 2025-12-04T11:52:36.3643084Z ) 2025-12-04T11:52:36.3643176Z ) 2025-12-04T11:52:36.3643301Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3643446Z ) 2025-12-04T11:52:36.3643535Z ) 2025-12-04T11:52:36.3643667Z (output_proj): Linear(in_features=16, out_features=23, bias=True) 2025-12-04T11:52:36.3643878Z (bn): BatchNorm1d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 2025-12-04T11:52:36.3644048Z ) 2025-12-04T11:52:36.3644138Z ) 2025-12-04T11:52:36.3644323Z ERROR: expected to be in states [] but current state is TrainingState.SUMMON_FULL_PARAMS 2025-12-04T11:52:36.3644536Z dist init r=1, world=2 2025-12-04T11:52:36.3644978Z [rank0]:[W1204 11:51:28.784663453 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3645431Z FAILED [5.3093s] [ 33%] 2025-12-04T11:52:36.3645497Z 2025-12-04T11:52:36.3645565Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3645761Z _____________ TestApplyCUDA.test_apply_in_summon_raises_error_cuda _____________ 2025-12-04T11:52:36.3645941Z Traceback (most recent call last): 2025-12-04T11:52:36.3646199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3646452Z self._join_processes(fn) 2025-12-04T11:52:36.3646705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3646975Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3647254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3647526Z raise RuntimeError(error) 2025-12-04T11:52:36.3647684Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3647852Z Traceback (most recent call last): 2025-12-04T11:52:36.3648099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3648351Z getattr(self, test_name)() 2025-12-04T11:52:36.3648589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3648827Z fn() 2025-12-04T11:52:36.3649038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3649295Z method(*args, **kwargs) 2025-12-04T11:52:36.3649528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3649808Z method(*args, **kwargs) 2025-12-04T11:52:36.3650035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3650271Z with policy(): 2025-12-04T11:52:36.3650492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3650729Z raise RuntimeError(msg) 2025-12-04T11:52:36.3651127Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3651482Z 2025-12-04T11:52:36.3651564Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3651928Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3652174Z 2025-12-04T11:52:36.3652270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3652396Z 2025-12-04T11:52:36.3652463Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3652611Z Traceback (most recent call last): 2025-12-04T11:52:36.3652862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3653111Z getattr(self, test_name)() 2025-12-04T11:52:36.3653350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3653590Z fn() 2025-12-04T11:52:36.3653797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3654037Z method(*args, **kwargs) 2025-12-04T11:52:36.3654265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3654539Z method(*args, **kwargs) 2025-12-04T11:52:36.3654768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3655004Z with policy(): 2025-12-04T11:52:36.3655222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3655461Z raise RuntimeError(msg) 2025-12-04T11:52:36.3655859Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3656218Z 2025-12-04T11:52:36.3656297Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3656621Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3656867Z 2025-12-04T11:52:36.3656957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3657087Z 2025-12-04T11:52:36.3657089Z 2025-12-04T11:52:36.3657172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3657380Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3657750Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-85ee52af0a9cd0d8.xml - 2025-12-04T11:52:36.3658088Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3658417Z FAILED [5.3093s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3658729Z Traceback (most recent call last): 2025-12-04T11:52:36.3658982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3659232Z getattr(self, test_name)() 2025-12-04T11:52:36.3659471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3659754Z fn() 2025-12-04T11:52:36.3659962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3660201Z method(*args, **kwargs) 2025-12-04T11:52:36.3660429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3660666Z method(*args, **kwargs) 2025-12-04T11:52:36.3660924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3661161Z with policy(): 2025-12-04T11:52:36.3661381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3661622Z raise RuntimeError(msg) 2025-12-04T11:52:36.3662021Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3662385Z 2025-12-04T11:52:36.3662465Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3662788Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3663032Z 2025-12-04T11:52:36.3663160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3663285Z 2025-12-04T11:52:36.3663352Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3663501Z Traceback (most recent call last): 2025-12-04T11:52:36.3663752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3664003Z getattr(self, test_name)() 2025-12-04T11:52:36.3664246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3664489Z fn() 2025-12-04T11:52:36.3664698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3664939Z method(*args, **kwargs) 2025-12-04T11:52:36.3665167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3665410Z method(*args, **kwargs) 2025-12-04T11:52:36.3665636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3665871Z with policy(): 2025-12-04T11:52:36.3666088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3666330Z raise RuntimeError(msg) 2025-12-04T11:52:36.3666726Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3667085Z 2025-12-04T11:52:36.3667167Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3667487Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3667731Z 2025-12-04T11:52:36.3667825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3668021Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3668186Z ============================== 1 failed in 5.47s =============================== 2025-12-04T11:52:36.3668326Z Got exit code 1 2025-12-04T11:52:36.3668433Z Retrying single test... 2025-12-04T11:52:36.3668699Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-f621c7c7a6125b2c.xml 2025-12-04T11:52:36.3668992Z ============================= test session starts ============================== 2025-12-04T11:52:36.3669211Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3669476Z cachedir: .pytest_cache 2025-12-04T11:52:36.3669750Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3670001Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3670128Z configfile: pytest.ini 2025-12-04T11:52:36.3670361Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3670636Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T11:52:36.3670949Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3671230Z Running 1 items in this shard 2025-12-04T11:52:36.3671308Z 2025-12-04T11:52:36.3671607Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda I1204 11:51:31.876000 232543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 232612 2025-12-04T11:52:36.3672128Z I1204 11:51:31.877000 232543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 232613 2025-12-04T11:52:36.3672690Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3673140Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3673729Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3674329Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3674790Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3675234Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3675821Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3676416Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3676566Z File "", line 1, in 2025-12-04T11:52:36.3676775Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main 2025-12-04T11:52:36.3676986Z exitcode = _main(fd, parent_sentinel) 2025-12-04T11:52:36.3677183Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/spawn.py", line 135, in _main 2025-12-04T11:52:36.3677386Z return self._bootstrap(parent_sentinel) 2025-12-04T11:52:36.3677598Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap 2025-12-04T11:52:36.3677798Z self.run() 2025-12-04T11:52:36.3677968Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/process.py", line 108, in run 2025-12-04T11:52:36.3678169Z self._target(*self._args, **self._kwargs) 2025-12-04T11:52:36.3678411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py", line 1272, in _run 2025-12-04T11:52:36.3678650Z self.run_test(test_name, pipe) 2025-12-04T11:52:36.3678939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3679191Z getattr(self, test_name)() 2025-12-04T11:52:36.3679431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3679671Z fn() 2025-12-04T11:52:36.3679918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3680158Z method(*args, **kwargs) 2025-12-04T11:52:36.3680390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3680630Z method(*args, **kwargs) 2025-12-04T11:52:36.3680856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3681094Z method(*args, **kwargs) 2025-12-04T11:52:36.3681347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test 2025-12-04T11:52:36.3681666Z result = test(self, **param_kwargs) 2025-12-04T11:52:36.3681918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 227, in wrapper 2025-12-04T11:52:36.3682164Z return func(*args, **kwargs) 2025-12-04T11:52:36.3682410Z File "/var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_apply.py", line 113, in test_apply_in_summon_raises_error 2025-12-04T11:52:36.3682672Z transformer.apply(self._init_linear_weights) 2025-12-04T11:52:36.3682946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 586, in apply 2025-12-04T11:52:36.3683214Z self._assert_state(TrainingState.IDLE) 2025-12-04T11:52:36.3683494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1028, in _assert_state 2025-12-04T11:52:36.3683771Z traceback.print_stack() 2025-12-04T11:52:36.3683993Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3684345Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3684843Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3685334Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3685824Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3686291Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3686743Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3687221Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3687696Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3688200Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3688674Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3689139Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3689603Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3690119Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3690769Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3691412Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3691769Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3692340Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3692829Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3693207Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3693633Z [rank0]:E1204 11:51:35.779000 232612 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3693884Z dist init r=0, world=2 2025-12-04T11:52:36.3694027Z Asserting FSDP instance is: FullyShardedDataParallel( 2025-12-04T11:52:36.3694208Z (_fsdp_wrapped_module): TransformerWithSharedParams( 2025-12-04T11:52:36.3694365Z (embed_tokens): Embedding(23, 16) 2025-12-04T11:52:36.3694497Z (transformer): Transformer( 2025-12-04T11:52:36.3694625Z (encoder): TransformerEncoder( 2025-12-04T11:52:36.3694755Z (layers): ModuleList( 2025-12-04T11:52:36.3694886Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T11:52:36.3695042Z (_fsdp_wrapped_module): TransformerEncoderLayer( 2025-12-04T11:52:36.3695200Z (self_attn): MultiheadAttention( 2025-12-04T11:52:36.3695405Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3695598Z ) 2025-12-04T11:52:36.3695732Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T11:52:36.3695899Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3696064Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T11:52:36.3696249Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3696434Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3696601Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3696749Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3696878Z ) 2025-12-04T11:52:36.3696974Z ) 2025-12-04T11:52:36.3697066Z ) 2025-12-04T11:52:36.3697238Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3697391Z ) 2025-12-04T11:52:36.3697494Z (decoder): TransformerDecoder( 2025-12-04T11:52:36.3697624Z (layers): ModuleList( 2025-12-04T11:52:36.3697755Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T11:52:36.3697910Z (_fsdp_wrapped_module): TransformerDecoderLayer( 2025-12-04T11:52:36.3698063Z (self_attn): MultiheadAttention( 2025-12-04T11:52:36.3698266Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3698458Z ) 2025-12-04T11:52:36.3698572Z (multihead_attn): MultiheadAttention( 2025-12-04T11:52:36.3698772Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3698959Z ) 2025-12-04T11:52:36.3699088Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T11:52:36.3699258Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3699454Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T11:52:36.3699637Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3699887Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3700069Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3700237Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3700384Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3700531Z (dropout3): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3700659Z ) 2025-12-04T11:52:36.3700754Z ) 2025-12-04T11:52:36.3700841Z ) 2025-12-04T11:52:36.3700966Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3701109Z ) 2025-12-04T11:52:36.3701199Z ) 2025-12-04T11:52:36.3701338Z (output_proj): Linear(in_features=16, out_features=23, bias=True) 2025-12-04T11:52:36.3701555Z (bn): BatchNorm1d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 2025-12-04T11:52:36.3701726Z ) 2025-12-04T11:52:36.3701813Z ) 2025-12-04T11:52:36.3701998Z ERROR: expected to be in states [] but current state is TrainingState.SUMMON_FULL_PARAMS 2025-12-04T11:52:36.3702314Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3702664Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3703166Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3703657Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3704147Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3704604Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3705052Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3705532Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3706048Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3706523Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3706994Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3707455Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3707919Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3708396Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3709078Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3709681Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3710113Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3710689Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3711183Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3711561Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3711986Z [rank1]:E1204 11:51:35.783000 232613 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3712238Z dist init r=1, world=2 2025-12-04T11:52:36.3712646Z [rank0]:[W1204 11:51:35.438423434 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3713064Z FAILED [5.3105s] [100%] 2025-12-04T11:52:36.3713134Z 2025-12-04T11:52:36.3713198Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3713392Z _____________ TestApplyCUDA.test_apply_in_summon_raises_error_cuda _____________ 2025-12-04T11:52:36.3713572Z Traceback (most recent call last): 2025-12-04T11:52:36.3713824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3714080Z self._join_processes(fn) 2025-12-04T11:52:36.3714333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3714604Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3714877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3715177Z raise RuntimeError(error) 2025-12-04T11:52:36.3715335Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3715506Z Traceback (most recent call last): 2025-12-04T11:52:36.3715757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3716009Z getattr(self, test_name)() 2025-12-04T11:52:36.3716250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3716488Z fn() 2025-12-04T11:52:36.3716700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3716940Z method(*args, **kwargs) 2025-12-04T11:52:36.3717169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3717406Z method(*args, **kwargs) 2025-12-04T11:52:36.3717636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3717915Z with policy(): 2025-12-04T11:52:36.3718134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3718372Z raise RuntimeError(msg) 2025-12-04T11:52:36.3718769Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3719128Z 2025-12-04T11:52:36.3719212Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3719533Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3719814Z 2025-12-04T11:52:36.3719912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3720039Z 2025-12-04T11:52:36.3720104Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3720254Z Traceback (most recent call last): 2025-12-04T11:52:36.3720505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3720756Z getattr(self, test_name)() 2025-12-04T11:52:36.3720996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3721238Z fn() 2025-12-04T11:52:36.3721446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3721683Z method(*args, **kwargs) 2025-12-04T11:52:36.3721910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3722153Z method(*args, **kwargs) 2025-12-04T11:52:36.3722377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3722611Z with policy(): 2025-12-04T11:52:36.3722828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3723067Z raise RuntimeError(msg) 2025-12-04T11:52:36.3723466Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3723828Z 2025-12-04T11:52:36.3723906Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3724262Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3724512Z 2025-12-04T11:52:36.3724602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3724733Z 2025-12-04T11:52:36.3724735Z 2025-12-04T11:52:36.3724815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3725024Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3725394Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-f621c7c7a6125b2c.xml - 2025-12-04T11:52:36.3725735Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3726063Z FAILED [5.3105s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3726371Z Traceback (most recent call last): 2025-12-04T11:52:36.3726654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3726906Z getattr(self, test_name)() 2025-12-04T11:52:36.3727146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3727386Z fn() 2025-12-04T11:52:36.3727594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3727834Z method(*args, **kwargs) 2025-12-04T11:52:36.3728062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3728299Z method(*args, **kwargs) 2025-12-04T11:52:36.3728530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3728771Z with policy(): 2025-12-04T11:52:36.3728989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3729227Z raise RuntimeError(msg) 2025-12-04T11:52:36.3729625Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3730018Z 2025-12-04T11:52:36.3730099Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3730418Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3730658Z 2025-12-04T11:52:36.3730754Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3730880Z 2025-12-04T11:52:36.3730944Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3731091Z Traceback (most recent call last): 2025-12-04T11:52:36.3731338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3731588Z getattr(self, test_name)() 2025-12-04T11:52:36.3731827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3732069Z fn() 2025-12-04T11:52:36.3732278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3732515Z method(*args, **kwargs) 2025-12-04T11:52:36.3732741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3733013Z method(*args, **kwargs) 2025-12-04T11:52:36.3733240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3733476Z with policy(): 2025-12-04T11:52:36.3733693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3733933Z raise RuntimeError(msg) 2025-12-04T11:52:36.3734332Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3734686Z 2025-12-04T11:52:36.3734766Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3735088Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3735369Z 2025-12-04T11:52:36.3735464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3735659Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3735834Z ======================= 1 failed, 2 deselected in 5.45s ======================== 2025-12-04T11:52:36.3735981Z Got exit code 1 2025-12-04T11:52:36.3736089Z Retrying single test... 2025-12-04T11:52:36.3736354Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-617c68a7b864689e.xml 2025-12-04T11:52:36.3736648Z ============================= test session starts ============================== 2025-12-04T11:52:36.3736869Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3737065Z cachedir: .pytest_cache 2025-12-04T11:52:36.3737299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3737552Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3737678Z configfile: pytest.ini 2025-12-04T11:52:36.3737912Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3738189Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T11:52:36.3738501Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3738783Z Running 1 items in this shard 2025-12-04T11:52:36.3738858Z 2025-12-04T11:52:36.3739149Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda I1204 11:51:39.693000 232771 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 232840 2025-12-04T11:52:36.3739629Z I1204 11:51:39.694000 232771 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 232841 2025-12-04T11:52:36.3740241Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3740689Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3741274Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3741869Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3742367Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3742816Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3743397Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3743987Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3744134Z File "", line 1, in 2025-12-04T11:52:36.3744341Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main 2025-12-04T11:52:36.3744551Z exitcode = _main(fd, parent_sentinel) 2025-12-04T11:52:36.3744785Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/spawn.py", line 135, in _main 2025-12-04T11:52:36.3744989Z return self._bootstrap(parent_sentinel) 2025-12-04T11:52:36.3745195Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap 2025-12-04T11:52:36.3745390Z self.run() 2025-12-04T11:52:36.3745553Z File "/opt/conda/envs/py_3.12/lib/python3.12/multiprocessing/process.py", line 108, in run 2025-12-04T11:52:36.3745750Z self._target(*self._args, **self._kwargs) 2025-12-04T11:52:36.3745985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py", line 1272, in _run 2025-12-04T11:52:36.3746218Z self.run_test(test_name, pipe) 2025-12-04T11:52:36.3746464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3746723Z getattr(self, test_name)() 2025-12-04T11:52:36.3746965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3747203Z fn() 2025-12-04T11:52:36.3762337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3762609Z method(*args, **kwargs) 2025-12-04T11:52:36.3762855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3763102Z method(*args, **kwargs) 2025-12-04T11:52:36.3763340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3763582Z method(*args, **kwargs) 2025-12-04T11:52:36.3763839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test 2025-12-04T11:52:36.3764112Z result = test(self, **param_kwargs) 2025-12-04T11:52:36.3764369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 227, in wrapper 2025-12-04T11:52:36.3764625Z return func(*args, **kwargs) 2025-12-04T11:52:36.3764881Z File "/var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_apply.py", line 113, in test_apply_in_summon_raises_error 2025-12-04T11:52:36.3765145Z transformer.apply(self._init_linear_weights) 2025-12-04T11:52:36.3765419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 586, in apply 2025-12-04T11:52:36.3765689Z self._assert_state(TrainingState.IDLE) 2025-12-04T11:52:36.3765966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1028, in _assert_state 2025-12-04T11:52:36.3766246Z traceback.print_stack() 2025-12-04T11:52:36.3766561Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3766918Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3767419Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3767914Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3768405Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3768872Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3769373Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3769909Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3770383Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3770854Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3771328Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3771802Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3772267Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3772744Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3773398Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3774007Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3774362Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3774944Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3775435Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3775856Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3776286Z [rank0]:E1204 11:51:43.491000 232840 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3776544Z dist init r=0, world=2 2025-12-04T11:52:36.3776690Z Asserting FSDP instance is: FullyShardedDataParallel( 2025-12-04T11:52:36.3776871Z (_fsdp_wrapped_module): TransformerWithSharedParams( 2025-12-04T11:52:36.3777033Z (embed_tokens): Embedding(23, 16) 2025-12-04T11:52:36.3777163Z (transformer): Transformer( 2025-12-04T11:52:36.3777294Z (encoder): TransformerEncoder( 2025-12-04T11:52:36.3777430Z (layers): ModuleList( 2025-12-04T11:52:36.3777567Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T11:52:36.3777728Z (_fsdp_wrapped_module): TransformerEncoderLayer( 2025-12-04T11:52:36.3777883Z (self_attn): MultiheadAttention( 2025-12-04T11:52:36.3778096Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3778337Z ) 2025-12-04T11:52:36.3778472Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T11:52:36.3778645Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3778810Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T11:52:36.3779003Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3779192Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3779363Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3779519Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3779657Z ) 2025-12-04T11:52:36.3779796Z ) 2025-12-04T11:52:36.3779895Z ) 2025-12-04T11:52:36.3780030Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3780180Z ) 2025-12-04T11:52:36.3780287Z (decoder): TransformerDecoder( 2025-12-04T11:52:36.3780426Z (layers): ModuleList( 2025-12-04T11:52:36.3780562Z (0-1): 2 x FullyShardedDataParallel( 2025-12-04T11:52:36.3780725Z (_fsdp_wrapped_module): TransformerDecoderLayer( 2025-12-04T11:52:36.3780879Z (self_attn): MultiheadAttention( 2025-12-04T11:52:36.3781080Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3781269Z ) 2025-12-04T11:52:36.3781380Z (multihead_attn): MultiheadAttention( 2025-12-04T11:52:36.3781583Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2025-12-04T11:52:36.3781770Z ) 2025-12-04T11:52:36.3781902Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2025-12-04T11:52:36.3782071Z (dropout): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3782242Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2025-12-04T11:52:36.3782426Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3782604Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3782785Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3782958Z (dropout1): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3783111Z (dropout2): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3783259Z (dropout3): Dropout(p=0.1, inplace=False) 2025-12-04T11:52:36.3783390Z ) 2025-12-04T11:52:36.3783484Z ) 2025-12-04T11:52:36.3783579Z ) 2025-12-04T11:52:36.3783709Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2025-12-04T11:52:36.3783860Z ) 2025-12-04T11:52:36.3783952Z ) 2025-12-04T11:52:36.3784080Z (output_proj): Linear(in_features=16, out_features=23, bias=True) 2025-12-04T11:52:36.3784330Z (bn): BatchNorm1d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 2025-12-04T11:52:36.3784509Z ) 2025-12-04T11:52:36.3784600Z ) 2025-12-04T11:52:36.3784788Z ERROR: expected to be in states [] but current state is TrainingState.SUMMON_FULL_PARAMS 2025-12-04T11:52:36.3785107Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3785457Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3785955Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3786450Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3786972Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3787428Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3787879Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3788350Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3788826Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3789304Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3789813Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3790271Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3790737Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3791213Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3791863Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3792467Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3792828Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3793434Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3793926Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3794306Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3794736Z [rank1]:E1204 11:51:43.493000 232841 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3794988Z dist init r=1, world=2 2025-12-04T11:52:36.3795400Z [rank0]:[W1204 11:51:43.143406688 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3795817Z FAILED [5.2098s] [100%] 2025-12-04T11:52:36.3795892Z 2025-12-04T11:52:36.3795956Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3796181Z _____________ TestApplyCUDA.test_apply_in_summon_raises_error_cuda _____________ 2025-12-04T11:52:36.3796366Z Traceback (most recent call last): 2025-12-04T11:52:36.3796626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3796885Z self._join_processes(fn) 2025-12-04T11:52:36.3797141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3797412Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3797686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3797956Z raise RuntimeError(error) 2025-12-04T11:52:36.3798124Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3798289Z Traceback (most recent call last): 2025-12-04T11:52:36.3798536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3798780Z getattr(self, test_name)() 2025-12-04T11:52:36.3799021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3799261Z fn() 2025-12-04T11:52:36.3799471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3799747Z method(*args, **kwargs) 2025-12-04T11:52:36.3799971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3800206Z method(*args, **kwargs) 2025-12-04T11:52:36.3800429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3800661Z with policy(): 2025-12-04T11:52:36.3800881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3801121Z raise RuntimeError(msg) 2025-12-04T11:52:36.3801523Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3801883Z 2025-12-04T11:52:36.3801963Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3802282Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3802524Z 2025-12-04T11:52:36.3802652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3802783Z 2025-12-04T11:52:36.3802848Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3802994Z Traceback (most recent call last): 2025-12-04T11:52:36.3803239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3803485Z getattr(self, test_name)() 2025-12-04T11:52:36.3803723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3803963Z fn() 2025-12-04T11:52:36.3804169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3804405Z method(*args, **kwargs) 2025-12-04T11:52:36.3804627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3804860Z method(*args, **kwargs) 2025-12-04T11:52:36.3805079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3805338Z with policy(): 2025-12-04T11:52:36.3805549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3805786Z raise RuntimeError(msg) 2025-12-04T11:52:36.3806181Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3806540Z 2025-12-04T11:52:36.3806614Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3806939Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3807179Z 2025-12-04T11:52:36.3807269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3807396Z 2025-12-04T11:52:36.3807398Z 2025-12-04T11:52:36.3807480Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3807685Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3808044Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-617c68a7b864689e.xml - 2025-12-04T11:52:36.3808378Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3808705Z FAILED [5.2098s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3809008Z Traceback (most recent call last): 2025-12-04T11:52:36.3809257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3809505Z getattr(self, test_name)() 2025-12-04T11:52:36.3809988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3810225Z fn() 2025-12-04T11:52:36.3810427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3810659Z method(*args, **kwargs) 2025-12-04T11:52:36.3810881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3811113Z method(*args, **kwargs) 2025-12-04T11:52:36.3811332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3811560Z with policy(): 2025-12-04T11:52:36.3811817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3812052Z raise RuntimeError(msg) 2025-12-04T11:52:36.3812445Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2252341248. 2025-12-04T11:52:36.3812809Z 2025-12-04T11:52:36.3812885Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3813204Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3813447Z 2025-12-04T11:52:36.3813538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3813664Z 2025-12-04T11:52:36.3813728Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3813904Z Traceback (most recent call last): 2025-12-04T11:52:36.3814150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3814397Z getattr(self, test_name)() 2025-12-04T11:52:36.3814632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3814867Z fn() 2025-12-04T11:52:36.3815068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3815302Z method(*args, **kwargs) 2025-12-04T11:52:36.3815524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3815757Z method(*args, **kwargs) 2025-12-04T11:52:36.3815981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3816217Z with policy(): 2025-12-04T11:52:36.3816434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3816669Z raise RuntimeError(msg) 2025-12-04T11:52:36.3817066Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_apply_in_summon_raises_error_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2099249152. 2025-12-04T11:52:36.3817426Z 2025-12-04T11:52:36.3817505Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3817821Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3818061Z 2025-12-04T11:52:36.3818154Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3818350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3818518Z ======================= 1 failed, 2 deselected in 5.37s ======================== 2025-12-04T11:52:36.3818660Z Got exit code 1 2025-12-04T11:52:36.3818873Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda 2025-12-04T11:52:36.3819189Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:52:36.3819551Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5efd2bf17644de42.xml 2025-12-04T11:52:36.3819879Z ============================= test session starts ============================== 2025-12-04T11:52:36.3820093Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3820317Z cachedir: .pytest_cache 2025-12-04T11:52:36.3820547Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3820792Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3820913Z configfile: pytest.ini 2025-12-04T11:52:36.3821145Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3821418Z collecting ... collected 3 items / 1 deselected / 2 selected 2025-12-04T11:52:36.3821580Z stepcurrent: skipping 1 already run items. 2025-12-04T11:52:36.3821715Z Running 2 items in this shard 2025-12-04T11:52:36.3821786Z 2025-12-04T11:52:36.3822064Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda I1204 11:51:47.150000 232999 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 233068 2025-12-04T11:52:36.3822531Z I1204 11:51:47.151000 232999 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 233069 2025-12-04T11:52:36.3823262Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3823849Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3824440Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3825030Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3825276Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3825621Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3826117Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3826604Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3827092Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3827547Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3827995Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3828464Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3828932Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3829429Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3829943Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3830402Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3830864Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3831333Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3831968Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3832597Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3832948Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3833505Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3833977Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3834347Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3834766Z [rank1]:E1204 11:51:51.358000 233069 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3835113Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3835454Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3835946Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3836432Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3836916Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3837368Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3837812Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3838279Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3838775Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3839249Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3839758Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3840214Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3840673Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3841143Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3841808Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3842396Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3842748Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3843302Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3843771Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3844141Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3844561Z [rank0]:E1204 11:51:51.358000 233068 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3844807Z dist init r=1, world=2 2025-12-04T11:52:36.3844912Z dist init r=0, world=2 2025-12-04T11:52:36.3845314Z [rank0]:[W1204 11:51:51.030654376 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3845725Z FAILED [5.6096s] [ 50%] 2025-12-04T11:52:36.3845792Z 2025-12-04T11:52:36.3845851Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3846040Z _________________ TestApplyCUDA.test_nested_module_apply_cuda __________________ 2025-12-04T11:52:36.3846208Z Traceback (most recent call last): 2025-12-04T11:52:36.3846454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3846702Z self._join_processes(fn) 2025-12-04T11:52:36.3846952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3847218Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3847488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3847754Z raise RuntimeError(error) 2025-12-04T11:52:36.3847946Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3848110Z Traceback (most recent call last): 2025-12-04T11:52:36.3848352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3848597Z getattr(self, test_name)() 2025-12-04T11:52:36.3848830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3849063Z fn() 2025-12-04T11:52:36.3849268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3849502Z method(*args, **kwargs) 2025-12-04T11:52:36.3849775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3850008Z method(*args, **kwargs) 2025-12-04T11:52:36.3850232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3850494Z with policy(): 2025-12-04T11:52:36.3850707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3850942Z raise RuntimeError(msg) 2025-12-04T11:52:36.3851328Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3851679Z 2025-12-04T11:52:36.3851758Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3852064Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3852294Z 2025-12-04T11:52:36.3852386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3852514Z 2025-12-04T11:52:36.3852516Z 2025-12-04T11:52:36.3852595Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3852798Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3853162Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5efd2bf17644de42.xml - 2025-12-04T11:52:36.3853497Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3853809Z FAILED [5.6096s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3854105Z Traceback (most recent call last): 2025-12-04T11:52:36.3854354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3854601Z getattr(self, test_name)() 2025-12-04T11:52:36.3854842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3855076Z fn() 2025-12-04T11:52:36.3855281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3855511Z method(*args, **kwargs) 2025-12-04T11:52:36.3855736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3855966Z method(*args, **kwargs) 2025-12-04T11:52:36.3856189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3856417Z with policy(): 2025-12-04T11:52:36.3856662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3856898Z raise RuntimeError(msg) 2025-12-04T11:52:36.3857283Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3857634Z 2025-12-04T11:52:36.3857710Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3858014Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3858246Z 2025-12-04T11:52:36.3858334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3858523Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3858689Z ======================= 1 failed, 1 deselected in 5.77s ======================== 2025-12-04T11:52:36.3858833Z Got exit code 1 2025-12-04T11:52:36.3858959Z Retrying single test... 2025-12-04T11:52:36.3859220Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5dbc2b37f6cd7bc8.xml 2025-12-04T11:52:36.3859511Z ============================= test session starts ============================== 2025-12-04T11:52:36.3859756Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3859948Z cachedir: .pytest_cache 2025-12-04T11:52:36.3860173Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3860415Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3860536Z configfile: pytest.ini 2025-12-04T11:52:36.3860765Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3861038Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T11:52:36.3861335Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda 2025-12-04T11:52:36.3861598Z Running 1 items in this shard 2025-12-04T11:52:36.3861673Z 2025-12-04T11:52:36.3861947Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda I1204 11:51:55.023000 233227 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 233296 2025-12-04T11:52:36.3862411Z I1204 11:51:55.023000 233227 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 233297 2025-12-04T11:52:36.3863108Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3863704Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3864296Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3864886Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3865130Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3865529Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3866034Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3866531Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3867022Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3867482Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3867935Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3868448Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3868918Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3869394Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3869911Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3870377Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3870842Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3871317Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3871955Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3872545Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3872904Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3873466Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3873941Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3874315Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3874739Z [rank1]:E1204 11:51:59.192000 233297 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3874985Z dist init r=1, world=2 2025-12-04T11:52:36.3875227Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3875573Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3876069Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3876556Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3877041Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3877504Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3877997Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3878467Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3878935Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3879405Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3879916Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3880377Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3880839Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3881313Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3881951Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3882548Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3882902Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3883463Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3883934Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3884304Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3884757Z [rank0]:E1204 11:51:59.200000 233296 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3885008Z dist init r=0, world=2 2025-12-04T11:52:36.3885415Z [rank0]:[W1204 11:51:59.890493632 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3885828Z FAILED [5.7096s] [100%] 2025-12-04T11:52:36.3885892Z 2025-12-04T11:52:36.3885954Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3886141Z _________________ TestApplyCUDA.test_nested_module_apply_cuda __________________ 2025-12-04T11:52:36.3886314Z Traceback (most recent call last): 2025-12-04T11:52:36.3886568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3886822Z self._join_processes(fn) 2025-12-04T11:52:36.3887107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3887376Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3887650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3887915Z raise RuntimeError(error) 2025-12-04T11:52:36.3888070Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3888238Z Traceback (most recent call last): 2025-12-04T11:52:36.3888481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3888729Z getattr(self, test_name)() 2025-12-04T11:52:36.3888968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3889210Z fn() 2025-12-04T11:52:36.3889418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3889653Z method(*args, **kwargs) 2025-12-04T11:52:36.3889918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3890156Z method(*args, **kwargs) 2025-12-04T11:52:36.3890379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3890613Z with policy(): 2025-12-04T11:52:36.3890829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3891069Z raise RuntimeError(msg) 2025-12-04T11:52:36.3891460Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3891810Z 2025-12-04T11:52:36.3891889Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3892196Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3892427Z 2025-12-04T11:52:36.3892518Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3892646Z 2025-12-04T11:52:36.3892706Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3892852Z Traceback (most recent call last): 2025-12-04T11:52:36.3893100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3893380Z getattr(self, test_name)() 2025-12-04T11:52:36.3893616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3893857Z fn() 2025-12-04T11:52:36.3894061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3894297Z method(*args, **kwargs) 2025-12-04T11:52:36.3894521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3894753Z method(*args, **kwargs) 2025-12-04T11:52:36.3894977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3895211Z with policy(): 2025-12-04T11:52:36.3895427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3895666Z raise RuntimeError(msg) 2025-12-04T11:52:36.3896048Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3896428Z 2025-12-04T11:52:36.3896505Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3896810Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3897040Z 2025-12-04T11:52:36.3897130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3897257Z 2025-12-04T11:52:36.3897259Z 2025-12-04T11:52:36.3897338Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3897543Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3897909Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-5dbc2b37f6cd7bc8.xml - 2025-12-04T11:52:36.3898248Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3898561Z FAILED [5.7096s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3898857Z Traceback (most recent call last): 2025-12-04T11:52:36.3899105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3899350Z getattr(self, test_name)() 2025-12-04T11:52:36.3899588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3899860Z fn() 2025-12-04T11:52:36.3900067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3900308Z method(*args, **kwargs) 2025-12-04T11:52:36.3900533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3900768Z method(*args, **kwargs) 2025-12-04T11:52:36.3900988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3901219Z with policy(): 2025-12-04T11:52:36.3901431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3901669Z raise RuntimeError(msg) 2025-12-04T11:52:36.3902094Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3902444Z 2025-12-04T11:52:36.3902524Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3902830Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3903058Z 2025-12-04T11:52:36.3903150Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3903274Z 2025-12-04T11:52:36.3903338Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3903483Z Traceback (most recent call last): 2025-12-04T11:52:36.3903730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3903978Z getattr(self, test_name)() 2025-12-04T11:52:36.3904216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3904452Z fn() 2025-12-04T11:52:36.3904689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3904920Z method(*args, **kwargs) 2025-12-04T11:52:36.3905143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3905372Z method(*args, **kwargs) 2025-12-04T11:52:36.3905593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3905823Z with policy(): 2025-12-04T11:52:36.3906039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3906275Z raise RuntimeError(msg) 2025-12-04T11:52:36.3906660Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3907009Z 2025-12-04T11:52:36.3907088Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3907391Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3907620Z 2025-12-04T11:52:36.3907713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3907906Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3908074Z ======================= 1 failed, 2 deselected in 5.87s ======================== 2025-12-04T11:52:36.3908214Z Got exit code 1 2025-12-04T11:52:36.3908316Z Retrying single test... 2025-12-04T11:52:36.3908581Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-4ff71aa14e27aa86.xml 2025-12-04T11:52:36.3908874Z ============================= test session starts ============================== 2025-12-04T11:52:36.3909089Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3909280Z cachedir: .pytest_cache 2025-12-04T11:52:36.3909510Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3909823Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3909947Z configfile: pytest.ini 2025-12-04T11:52:36.3910178Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3910451Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T11:52:36.3910782Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda 2025-12-04T11:52:36.3911053Z Running 1 items in this shard 2025-12-04T11:52:36.3911126Z 2025-12-04T11:52:36.3911403Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda I1204 11:52:02.938000 233455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 233524 2025-12-04T11:52:36.3911872Z I1204 11:52:02.939000 233455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 233525 2025-12-04T11:52:36.3912574Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3913169Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3913761Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3914402Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3914647Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3914996Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3915511Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3916008Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3916500Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3916958Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3917414Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3917894Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3918373Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3918845Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3919319Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3919819Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3920318Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3920792Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3921434Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3922024Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3922376Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3922930Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3923084Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3923301Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3923468Z [rank1]:E1204 11:52:07.163000 233525 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3923510Z dist init r=1, world=2 2025-12-04T11:52:36.3923652Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3923817Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3924115Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3924273Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3924563Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3924689Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3924981Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3925139Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3925420Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3925573Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3925854Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3926016Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3926301Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3926458Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3926901Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3927018Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3927222Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3927568Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3927687Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3927902Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3928072Z [rank0]:E1204 11:52:07.238000 233524 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3928116Z dist init r=0, world=2 2025-12-04T11:52:36.3928463Z [rank0]:[W1204 11:52:07.013064098 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3928510Z FAILED [5.7094s] [100%] 2025-12-04T11:52:36.3928512Z 2025-12-04T11:52:36.3928569Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3928660Z _________________ TestApplyCUDA.test_nested_module_apply_cuda __________________ 2025-12-04T11:52:36.3928711Z Traceback (most recent call last): 2025-12-04T11:52:36.3928882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3928928Z self._join_processes(fn) 2025-12-04T11:52:36.3929108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3929166Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3929351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3929396Z raise RuntimeError(error) 2025-12-04T11:52:36.3929483Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3929532Z Traceback (most recent call last): 2025-12-04T11:52:36.3929736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3929781Z getattr(self, test_name)() 2025-12-04T11:52:36.3929949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3929987Z fn() 2025-12-04T11:52:36.3930180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3930227Z method(*args, **kwargs) 2025-12-04T11:52:36.3930386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3930428Z method(*args, **kwargs) 2025-12-04T11:52:36.3930584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3930629Z with policy(): 2025-12-04T11:52:36.3930784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3930832Z raise RuntimeError(msg) 2025-12-04T11:52:36.3931146Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3931149Z 2025-12-04T11:52:36.3931265Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3931460Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3931462Z 2025-12-04T11:52:36.3931557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3931559Z 2025-12-04T11:52:36.3931561Z 2025-12-04T11:52:36.3931640Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3931736Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3931980Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-4ff71aa14e27aa86.xml - 2025-12-04T11:52:36.3932048Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3932267Z FAILED [5.7094s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3932317Z Traceback (most recent call last): 2025-12-04T11:52:36.3932488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3932533Z getattr(self, test_name)() 2025-12-04T11:52:36.3932696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3932731Z fn() 2025-12-04T11:52:36.3932887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3932929Z method(*args, **kwargs) 2025-12-04T11:52:36.3933088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3933130Z method(*args, **kwargs) 2025-12-04T11:52:36.3933290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3933330Z with policy(): 2025-12-04T11:52:36.3933487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3933532Z raise RuntimeError(msg) 2025-12-04T11:52:36.3933850Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_nested_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3933852Z 2025-12-04T11:52:36.3933929Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3934157Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_nested_module_apply_cuda 2025-12-04T11:52:36.3934161Z 2025-12-04T11:52:36.3934254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3934320Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3934388Z ======================= 1 failed, 2 deselected in 5.87s ======================== 2025-12-04T11:52:36.3934428Z Got exit code 1 2025-12-04T11:52:36.3934577Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda 2025-12-04T11:52:36.3934708Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:52:36.3934907Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-14bba085fc81e0df.xml 2025-12-04T11:52:36.3934968Z ============================= test session starts ============================== 2025-12-04T11:52:36.3935087Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3935155Z cachedir: .pytest_cache 2025-12-04T11:52:36.3935318Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3935368Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3935415Z configfile: pytest.ini 2025-12-04T11:52:36.3935578Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3935658Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T11:52:36.3935713Z stepcurrent: skipping 2 already run items. 2025-12-04T11:52:36.3935764Z Running 1 items in this shard 2025-12-04T11:52:36.3935766Z 2025-12-04T11:52:36.3936055Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda I1204 11:52:11.032000 233683 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 233752 2025-12-04T11:52:36.3936214Z I1204 11:52:11.033000 233683 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 233753 2025-12-04T11:52:36.3936586Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3936637Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3937143Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3937208Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3937572Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3937626Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3938121Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3938186Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3938352Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3938522Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3938817Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3938980Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3939272Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3939405Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3939749Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3939902Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3940188Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3940341Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3940627Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3940768Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3941055Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3941211Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3941671Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3941797Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3941996Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3942332Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3942452Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3942668Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3942868Z [rank0]:E1204 11:52:15.512000 233752 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3942911Z dist init r=0, world=2 2025-12-04T11:52:36.3943056Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3943218Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3943516Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3943675Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3943976Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3944134Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3944420Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3944576Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3944857Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3945010Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3945293Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3945437Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3945721Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3945876Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3946337Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3946457Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3946660Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3946987Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3947126Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3947344Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3947517Z [rank1]:E1204 11:52:15.515000 233753 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3947563Z dist init r=1, world=2 2025-12-04T11:52:36.3947903Z [rank0]:[W1204 11:52:15.295113056 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3947950Z FAILED [6.1095s] [100%] 2025-12-04T11:52:36.3947952Z 2025-12-04T11:52:36.3948009Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3948102Z _______________ TestApplyCUDA.test_transformer_module_apply_cuda _______________ 2025-12-04T11:52:36.3948172Z Traceback (most recent call last): 2025-12-04T11:52:36.3948341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3948388Z self._join_processes(fn) 2025-12-04T11:52:36.3948571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3948628Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3948814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3948860Z raise RuntimeError(error) 2025-12-04T11:52:36.3948945Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3948992Z Traceback (most recent call last): 2025-12-04T11:52:36.3949163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3949209Z getattr(self, test_name)() 2025-12-04T11:52:36.3949377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3949414Z fn() 2025-12-04T11:52:36.3949573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3949618Z method(*args, **kwargs) 2025-12-04T11:52:36.3949810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3949858Z method(*args, **kwargs) 2025-12-04T11:52:36.3950012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3950054Z with policy(): 2025-12-04T11:52:36.3950212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3950263Z raise RuntimeError(msg) 2025-12-04T11:52:36.3950587Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3950589Z 2025-12-04T11:52:36.3950668Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3950871Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3950873Z 2025-12-04T11:52:36.3950968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3950970Z 2025-12-04T11:52:36.3950972Z 2025-12-04T11:52:36.3951089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3951182Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3951428Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-14bba085fc81e0df.xml - 2025-12-04T11:52:36.3951490Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3951715Z FAILED [6.1095s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3951764Z Traceback (most recent call last): 2025-12-04T11:52:36.3951934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3951977Z getattr(self, test_name)() 2025-12-04T11:52:36.3952146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3952214Z fn() 2025-12-04T11:52:36.3952374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3952418Z method(*args, **kwargs) 2025-12-04T11:52:36.3952576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3952619Z method(*args, **kwargs) 2025-12-04T11:52:36.3952778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3952819Z with policy(): 2025-12-04T11:52:36.3952977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3953020Z raise RuntimeError(msg) 2025-12-04T11:52:36.3953352Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3953356Z 2025-12-04T11:52:36.3953436Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3953636Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3953638Z 2025-12-04T11:52:36.3953732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3953797Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3953865Z ======================= 1 failed, 2 deselected in 6.26s ======================== 2025-12-04T11:52:36.3953905Z Got exit code 1 2025-12-04T11:52:36.3953953Z Retrying single test... 2025-12-04T11:52:36.3954152Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-a6b05aaca6b3e813.xml 2025-12-04T11:52:36.3954216Z ============================= test session starts ============================== 2025-12-04T11:52:36.3954331Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3954379Z cachedir: .pytest_cache 2025-12-04T11:52:36.3954539Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3954590Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3954634Z configfile: pytest.ini 2025-12-04T11:52:36.3954800Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3954874Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T11:52:36.3955096Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T11:52:36.3955144Z Running 1 items in this shard 2025-12-04T11:52:36.3955149Z 2025-12-04T11:52:36.3955433Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda I1204 11:52:19.418000 233911 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 233980 2025-12-04T11:52:36.3955596Z I1204 11:52:19.419000 233911 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 233981 2025-12-04T11:52:36.3955959Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3956014Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3956687Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3956782Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3957145Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3957194Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3957691Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3957757Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3957906Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3958072Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3958369Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3958530Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3958827Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3958958Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3959240Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3959394Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3959742Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3959901Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3960183Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3960326Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3960617Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3960769Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3961255Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3961375Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3961579Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3961910Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3962031Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3962251Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3962420Z [rank0]:E1204 11:52:23.828000 233980 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3962466Z dist init r=0, world=2 2025-12-04T11:52:36.3962608Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3962776Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3963072Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3963236Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3963526Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3963658Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3963943Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3964124Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3964410Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3964561Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3964847Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3964987Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3965278Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3965462Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3965913Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3966037Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3966238Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3966572Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3966688Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3966908Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3967080Z [rank1]:E1204 11:52:23.834000 233981 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3967122Z dist init r=1, world=2 2025-12-04T11:52:36.3967469Z [rank0]:[W1204 11:52:24.515240176 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3967512Z FAILED [5.9098s] [100%] 2025-12-04T11:52:36.3967514Z 2025-12-04T11:52:36.3967575Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3967665Z _______________ TestApplyCUDA.test_transformer_module_apply_cuda _______________ 2025-12-04T11:52:36.3967718Z Traceback (most recent call last): 2025-12-04T11:52:36.3967884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3967935Z self._join_processes(fn) 2025-12-04T11:52:36.3968114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3968171Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3968378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3968431Z raise RuntimeError(error) 2025-12-04T11:52:36.3968513Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3968565Z Traceback (most recent call last): 2025-12-04T11:52:36.3968730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3968780Z getattr(self, test_name)() 2025-12-04T11:52:36.3968946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3968983Z fn() 2025-12-04T11:52:36.3969143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3969187Z method(*args, **kwargs) 2025-12-04T11:52:36.3969348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3969415Z method(*args, **kwargs) 2025-12-04T11:52:36.3969574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3969615Z with policy(): 2025-12-04T11:52:36.3969818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3969863Z raise RuntimeError(msg) 2025-12-04T11:52:36.3970191Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3970193Z 2025-12-04T11:52:36.3970270Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3970475Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3970479Z 2025-12-04T11:52:36.3970571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3970577Z 2025-12-04T11:52:36.3970641Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3970695Z Traceback (most recent call last): 2025-12-04T11:52:36.3970862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3970910Z getattr(self, test_name)() 2025-12-04T11:52:36.3971071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3971116Z fn() 2025-12-04T11:52:36.3971271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3971317Z method(*args, **kwargs) 2025-12-04T11:52:36.3971473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3971520Z method(*args, **kwargs) 2025-12-04T11:52:36.3971674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3971716Z with policy(): 2025-12-04T11:52:36.3971872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3971919Z raise RuntimeError(msg) 2025-12-04T11:52:36.3972241Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3972243Z 2025-12-04T11:52:36.3972353Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3972555Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3972561Z 2025-12-04T11:52:36.3972651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3972653Z 2025-12-04T11:52:36.3972655Z 2025-12-04T11:52:36.3972738Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3972828Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3973074Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-a6b05aaca6b3e813.xml - 2025-12-04T11:52:36.3973137Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3973366Z FAILED [5.9098s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T11:52:36.3973446Z Traceback (most recent call last): 2025-12-04T11:52:36.3973617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3973663Z getattr(self, test_name)() 2025-12-04T11:52:36.3973830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3973867Z fn() 2025-12-04T11:52:36.3977127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3977182Z method(*args, **kwargs) 2025-12-04T11:52:36.3977347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3977396Z method(*args, **kwargs) 2025-12-04T11:52:36.3977558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3977597Z with policy(): 2025-12-04T11:52:36.3977755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3977797Z raise RuntimeError(msg) 2025-12-04T11:52:36.3978121Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3978124Z 2025-12-04T11:52:36.3978200Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3978404Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3978409Z 2025-12-04T11:52:36.3978499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3978501Z 2025-12-04T11:52:36.3978560Z Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3978611Z Traceback (most recent call last): 2025-12-04T11:52:36.3978777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3978826Z getattr(self, test_name)() 2025-12-04T11:52:36.3978990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3979031Z fn() 2025-12-04T11:52:36.3979186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3979233Z method(*args, **kwargs) 2025-12-04T11:52:36.3979423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3979470Z method(*args, **kwargs) 2025-12-04T11:52:36.3979622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3979662Z with policy(): 2025-12-04T11:52:36.3979853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3979897Z raise RuntimeError(msg) 2025-12-04T11:52:36.3980220Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3980222Z 2025-12-04T11:52:36.3980298Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3980500Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3980540Z 2025-12-04T11:52:36.3980631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3980703Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.3980769Z ======================= 1 failed, 2 deselected in 6.07s ======================== 2025-12-04T11:52:36.3980812Z Got exit code 1 2025-12-04T11:52:36.3980855Z Retrying single test... 2025-12-04T11:52:36.3981056Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-3b5be2e8a8c31d5c.xml 2025-12-04T11:52:36.3981116Z ============================= test session starts ============================== 2025-12-04T11:52:36.3981237Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.3981282Z cachedir: .pytest_cache 2025-12-04T11:52:36.3981449Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.3981498Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.3981542Z configfile: pytest.ini 2025-12-04T11:52:36.3981707Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.3981784Z collecting ... collected 3 items / 2 deselected / 1 selected 2025-12-04T11:52:36.3981982Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T11:52:36.3982034Z Running 1 items in this shard 2025-12-04T11:52:36.3982037Z 2025-12-04T11:52:36.3982324Z distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda I1204 11:52:27.605000 234139 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 234208 2025-12-04T11:52:36.3982485Z I1204 11:52:27.605000 234139 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 234209 2025-12-04T11:52:36.3982848Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3982897Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3983397Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3983488Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3983851Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T11:52:36.3983901Z self.encoder = TransformerEncoder( 2025-12-04T11:52:36.3984401Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T11:52:36.3984470Z device_from_device_id = _get_device_from_device_id( 2025-12-04T11:52:36.3984619Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3984818Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3985116Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3985275Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3985566Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3985699Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3985986Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3986139Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3986421Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3986570Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3986858Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3987000Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3987284Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3987438Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3987914Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3988036Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3988237Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3988570Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3988686Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3988907Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3989083Z [rank1]:E1204 11:52:32.007000 234209 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T11:52:36.3989146Z dist init r=1, world=2 2025-12-04T11:52:36.3989291Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T11:52:36.3989454Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T11:52:36.3989799Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3989960Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T11:52:36.3990254Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3990382Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T11:52:36.3990668Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3990823Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3991103Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3991258Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T11:52:36.3991542Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3991683Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T11:52:36.3991967Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3992123Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T11:52:36.3992603Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2017460224 and is now 2323644416. 2025-12-04T11:52:36.3992722Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3992925Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3993253Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3993376Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T11:52:36.3993617Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3993788Z [rank0]:E1204 11:52:32.056000 234208 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T11:52:36.3993831Z dist init r=0, world=2 2025-12-04T11:52:36.3994176Z [rank0]:[W1204 11:52:32.878651261 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T11:52:36.3994223Z FAILED [5.9109s] [100%] 2025-12-04T11:52:36.3994225Z 2025-12-04T11:52:36.3994284Z =================================== FAILURES =================================== 2025-12-04T11:52:36.3994380Z _______________ TestApplyCUDA.test_transformer_module_apply_cuda _______________ 2025-12-04T11:52:36.3994431Z Traceback (most recent call last): 2025-12-04T11:52:36.3994603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T11:52:36.3994650Z self._join_processes(fn) 2025-12-04T11:52:36.3994830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T11:52:36.3994887Z self._check_return_codes(fn, elapsed_time) 2025-12-04T11:52:36.3995075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T11:52:36.3995119Z raise RuntimeError(error) 2025-12-04T11:52:36.3995206Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3995252Z Traceback (most recent call last): 2025-12-04T11:52:36.3995421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3995469Z getattr(self, test_name)() 2025-12-04T11:52:36.3995635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3995670Z fn() 2025-12-04T11:52:36.3995834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3995877Z method(*args, **kwargs) 2025-12-04T11:52:36.3996033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3996077Z method(*args, **kwargs) 2025-12-04T11:52:36.3996230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3996270Z with policy(): 2025-12-04T11:52:36.3996447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3996495Z raise RuntimeError(msg) 2025-12-04T11:52:36.3996818Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3996820Z 2025-12-04T11:52:36.3996903Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3997105Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3997108Z 2025-12-04T11:52:36.3997200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3997203Z 2025-12-04T11:52:36.3997204Z 2025-12-04T11:52:36.3997283Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:52:36.3997393Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T11:52:36.3997634Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-3b5be2e8a8c31d5c.xml - 2025-12-04T11:52:36.3997696Z =========================== short test summary info ============================ 2025-12-04T11:52:36.3997917Z FAILED [5.9109s] distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T11:52:36.3997965Z Traceback (most recent call last): 2025-12-04T11:52:36.3998133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T11:52:36.3998177Z getattr(self, test_name)() 2025-12-04T11:52:36.3998344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T11:52:36.3998383Z fn() 2025-12-04T11:52:36.3998538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3998581Z method(*args, **kwargs) 2025-12-04T11:52:36.3998738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:52:36.3998778Z method(*args, **kwargs) 2025-12-04T11:52:36.3998933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:52:36.3998973Z with policy(): 2025-12-04T11:52:36.3999132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:52:36.3999175Z raise RuntimeError(msg) 2025-12-04T11:52:36.3999501Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestApplyCUDA.test_transformer_module_apply_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 1864368128 and is now 2170552320. 2025-12-04T11:52:36.3999504Z 2025-12-04T11:52:36.3999585Z To execute this test, run the following from the base repo dir: 2025-12-04T11:52:36.3999816Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_apply.py TestApplyCUDA.test_transformer_module_apply_cuda 2025-12-04T11:52:36.3999819Z 2025-12-04T11:52:36.3999909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:52:36.3999976Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:52:36.4000045Z ======================= 1 failed, 2 deselected in 6.05s ======================== 2025-12-04T11:52:36.4000084Z Got exit code 1 2025-12-04T11:52:36.4000270Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda 2025-12-04T11:52:36.4000403Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:52:36.4000602Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-9f50682c733286b8.xml 2025-12-04T11:52:36.4000663Z ============================= test session starts ============================== 2025-12-04T11:52:36.4000782Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:52:36.4000826Z cachedir: .pytest_cache 2025-12-04T11:52:36.4000987Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:52:36.4001037Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:52:36.4001081Z configfile: pytest.ini 2025-12-04T11:52:36.4001247Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:52:36.4001356Z collecting ... collected 3 items / 3 deselected / 0 selected 2025-12-04T11:52:36.4001411Z stepcurrent: skipping 3 already run items. 2025-12-04T11:52:36.4001460Z Running 0 items in this shard 2025-12-04T11:52:36.4001462Z 2025-12-04T11:52:36.4001701Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_apply/distributed.fsdp.test_fsdp_apply-9f50682c733286b8.xml - 2025-12-04T11:52:36.4001762Z ============================ 3 deselected in 0.00s ============================= 2025-12-04T11:52:36.4002197Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_apply_in_summon_raises_error_cuda', 'test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_nested_module_apply_cuda', 'test/distributed/fsdp/test_fsdp_apply.py::TestApplyCUDA::test_transformer_module_apply_cuda'] 2025-12-04T11:52:36.4002200Z 2025-12-04T11:52:36.4002391Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_apply 1/1 (test/test-reports/distributed.fsdp.test_fsdp_apply_1.1_40041d9465e03b91_.log) 2025-12-04T11:52:36.4002396Z 2025-12-04T11:52:36.4002523Z Finished distributed/fsdp/test_fsdp_apply 1/1 ... [2025-12-04 11:52:36.360014][2234298.835039263], took 1.23min 2025-12-04T11:52:36.4002808Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:52:36.4002899Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:52:36.4003000Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T11:52:36.4003049Z Uploading artifacts took 0.00 seconds 2025-12-04T11:52:36.4003108Z distributed/fsdp/test_fsdp_apply 1/1 failed! 2025-12-04T11:52:36.4003251Z Running distributed/_composable/fsdp/test_fully_shard_frozen 1/1 ... [2025-12-04 11:52:36.363097][2234298.838126696] 2025-12-04T11:52:36.4003306Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:52:36.4003653Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_frozen.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:52:36.363250] 2025-12-04T11:53:18.8472817Z 2025-12-04T11:53:18.8474087Z distributed/_composable/fsdp/test_fully_shard_frozen 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_frozen_1.1_d2e1d89e14e502d6_.log 2025-12-04T11:53:18.8476759Z Running 3 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_frozen.py::TestFullyShardFrozen::test_multi_forward_mixed_requires_grad, test/distributed/_composable/fsdp/test_fully_shard_frozen.py::TestFullyShardFrozen::test_train_mixed_requires_grad_across_groups, test/distributed/_composable/fsdp/test_fully_shard_frozen.py::TestFullyShardFrozen::test_train_mixed_requires_grad_per_group 2025-12-04T11:53:18.8478048Z 2025-12-04T11:53:18.8478352Z Finished distributed/_composable/fsdp/test_fully_shard_frozen 1/1 ... [2025-12-04 11:53:18.846768][2234341.321793172], took 0.71min 2025-12-04T11:53:18.8487007Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:53:18.8502635Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:53:18.8504973Z Running distributed/checkpoint/test_hsdp_checkpoint 1/1 ... [2025-12-04 11:53:18.850331][2234341.325361394] 2025-12-04T11:53:18.8505334Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:53:18.8506080Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_hsdp_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:53:18.850498] 2025-12-04T11:53:50.3657300Z 2025-12-04T11:53:50.3658271Z distributed/checkpoint/test_hsdp_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_hsdp_checkpoint_1.1_0b703dd5741fc87e_.log 2025-12-04T11:53:50.3661357Z Running 4 items in this shard: test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_checkpoint_is_even_sharded_model_False, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_checkpoint_is_even_sharded_model_True, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_fsdp_checkpoint_conversion_is_even_sharded_model_False, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_fsdp_checkpoint_conversion_is_even_sharded_model_True 2025-12-04T11:53:50.3663597Z 2025-12-04T11:53:50.3664019Z Finished distributed/checkpoint/test_hsdp_checkpoint 1/1 ... [2025-12-04 11:53:50.365552][2234372.840576893], took 0.53min 2025-12-04T11:53:50.3672662Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:53:50.3688872Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:53:50.3692227Z Running distributed/tensor/parallel/test_parallelize_api 1/1 ... [2025-12-04 11:53:50.369107][2234372.844137176] 2025-12-04T11:53:50.3692565Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:53:50.3694303Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/parallel/test_parallelize_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:53:50.369302] 2025-12-04T11:55:51.2750516Z 2025-12-04T11:55:51.2752030Z distributed/tensor/parallel/test_parallelize_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_parallelize_api_1.1_b7d5d59aa6ef25bf_.log 2025-12-04T11:55:51.2767022Z Running 32 items in this shard: test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_empty_plan, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_linear_col_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_linear_row_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_mlp_with_module_api, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_mlp_with_module_api_nested, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_multi_wildcard, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_src_data_rank, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_digit, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_no_match, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_question, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_root_module, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_star, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_input, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_input_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_under_devicemesh_context, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_empty_plan, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_linear_col_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_linear_row_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_mlp_with_module_api, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_mlp_with_module_api_nested, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_multi_wildcard, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_src_data_rank, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_digit, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_no_match, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_question, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_root_module, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_star, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_input, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_input_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_under_devicemesh_context 2025-12-04T11:55:51.2776326Z 2025-12-04T11:55:51.2776552Z Finished distributed/tensor/parallel/test_parallelize_api 1/1 ... [2025-12-04 11:55:51.274664][2234493.749689361], took 2.02min 2025-12-04T11:55:51.2777221Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T11:55:51.2780154Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:55:51.2782286Z Running distributed/tensor/test_view_ops 1/1 ... [2025-12-04 11:55:51.278107][2234493.753136936] 2025-12-04T11:55:51.2782514Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:55:51.2783776Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/tensor/test_view_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:55:51.278270] 2025-12-04T12:01:02.5381794Z 2025-12-04T12:01:02.5382589Z distributed/tensor/test_view_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_view_ops_1.1_763c2e530e311708_.log 2025-12-04T12:01:02.5386809Z Running 20 items in this shard: test/distributed/tensor/test_view_ops.py::TestViewOps::test_complex_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOps::test_dtensor_view_op_uneven, test/distributed/tensor/test_view_ops.py::TestViewOps::test_illegal_views, test/distributed/tensor/test_view_ops.py::TestViewOps::test_squeeze_, test/distributed/tensor/test_view_ops.py::TestViewOps::test_storage_offset_shard_dim0_slice_dim1, test/distributed/tensor/test_view_ops.py::TestViewOps::test_storage_offset_shard_dim1_slice_dim0, test/distributed/tensor/test_view_ops.py::TestViewOps::test_storage_offset_slice, test/distributed/tensor/test_view_ops.py::TestViewOps::test_view_groups, test/distributed/tensor/test_view_ops.py::TestViewOps::test_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOps::test_view_redistribution, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_complex_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_dtensor_view_op_uneven, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_illegal_views, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_squeeze_, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_storage_offset_shard_dim0_slice_dim1, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_storage_offset_shard_dim1_slice_dim0, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_storage_offset_slice, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_view_groups, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_view_redistribution 2025-12-04T12:01:02.5391422Z 2025-12-04T12:01:02.5391587Z Finished distributed/tensor/test_view_ops 1/1 ... [2025-12-04 12:01:02.537873][2234805.012901165], took 5.19min 2025-12-04T12:01:02.5392161Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:01:02.5406474Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:01:02.5406777Z Running distributed/fsdp/test_fsdp_state_dict 1/2 ... [2025-12-04 12:01:02.540547][2234805.015576837] 2025-12-04T12:01:02.5407015Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:01:02.5408660Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_state_dict.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:01:02.540704] 2025-12-04T12:07:28.3682345Z 2025-12-04T12:07:28.3683513Z distributed/fsdp/test_fsdp_state_dict 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_state_dict_1.2_c86e30179fbb8f73_.log 2025-12-04T12:07:28.3713867Z Running 78 items in this shard: test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_keys_state_dict_type_local_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_keys_state_dict_type_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_full_state_dict_missing_unexpected_keys_cleaned, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_rank0_offload_save_load_flow_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_save_load_flow_state_dict_type_sharded_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_type, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_shared_parameters_state_dict_type_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_torch_save_load, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict4GPUs::test_local_state_dict_reshard 2025-12-04T12:07:28.3733761Z 2025-12-04T12:07:28.3733904Z Finished distributed/fsdp/test_fsdp_state_dict 1/2 ... [2025-12-04 12:07:28.368919][2235190.84394308], took 6.43min 2025-12-04T12:07:28.3734357Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:07:28.3734753Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:07:28.3734997Z Running distributed/_pycute/test_typing 1/1 ... [2025-12-04 12:07:28.372228][2235190.847258062] 2025-12-04T12:07:28.3735197Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:07:28.3735600Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_pycute/test_typing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:07:28.372422] 2025-12-04T12:07:30.4903330Z 2025-12-04T12:07:30.4904232Z distributed/_pycute/test_typing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_typing_1.1_e5ec087a75180543_.log 2025-12-04T12:07:30.4905107Z Running 1 items in this shard: test/distributed/_pycute/test_typing.py::TestTyping::test_typing 2025-12-04T12:07:30.4905431Z 2025-12-04T12:07:30.4906178Z Finished distributed/_pycute/test_typing 1/1 ... [2025-12-04 12:07:30.490005][2235192.965028653], took 0.04min 2025-12-04T12:07:30.4914603Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:07:30.4929289Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:07:30.4932794Z Running distributed/test_distributed_spawn 1/7 ... [2025-12-04 12:07:30.493165][2235192.968193988] 2025-12-04T12:07:30.4933875Z MPI not available -- MPI backend tests will be skipped 2025-12-04T12:07:30.4935167Z Running distributed tests for the test backend with env init_method 2025-12-04T12:07:30.4936089Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:07:30.4937816Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:07:30.493668] 2025-12-04T12:07:32.5019980Z 2025-12-04T12:07:32.5020946Z distributed/test_distributed_spawn 1/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.7_ec50c9447b8d0b64_.log 2025-12-04T12:07:32.5021754Z Running 0 items in this shard: 2025-12-04T12:07:32.5021927Z 2025-12-04T12:07:32.5028258Z Running distributed tests for the test backend with file init_method 2025-12-04T12:07:32.5028989Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:07:32.5032254Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:07:32.503015] 2025-12-04T12:07:34.5148791Z 2025-12-04T12:07:34.5150121Z distributed/test_distributed_spawn 1/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.7_0c6b6094988ab74a_.log 2025-12-04T12:07:34.5151101Z Running 0 items in this shard: 2025-12-04T12:07:34.5151300Z 2025-12-04T12:07:34.5154537Z Running distributed tests for the nccl backend with env init_method 2025-12-04T12:07:34.5155359Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:07:34.5158125Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:07:34.515642] 2025-12-04T12:11:17.0562072Z 2025-12-04T12:11:17.0563173Z distributed/test_distributed_spawn 1/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.7_bf4c41cf2965d6bf_.log 2025-12-04T12:11:17.0574437Z Running 38 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:11:17.0583088Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel 2025-12-04T12:11:17.0583604Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook 2025-12-04T12:11:17.0584113Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex 2025-12-04T12:11:17.0584563Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda 2025-12-04T12:11:17.0585008Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T12:11:17.0585472Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T12:11:17.0585913Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda 2025-12-04T12:11:17.0586416Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T12:11:17.0586871Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T12:11:17.0587298Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group 2025-12-04T12:11:17.0587760Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T12:11:17.0588227Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T12:11:17.0588629Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward 2025-12-04T12:11:17.0589050Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T12:11:17.0589502Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T12:11:17.0589919Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization 2025-12-04T12:11:17.0590297Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce 2025-12-04T12:11:17.0590666Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD 2025-12-04T12:11:17.0591038Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence 2025-12-04T12:11:17.0591450Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params 2025-12-04T12:11:17.0591911Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none 2025-12-04T12:11:17.0592387Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none 2025-12-04T12:11:17.0592818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval 2025-12-04T12:11:17.0593198Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception 2025-12-04T12:11:17.0593591Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T12:11:17.0594013Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception 2025-12-04T12:11:17.0594420Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static 2025-12-04T12:11:17.0594799Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T12:11:17.0595182Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T12:11:17.0595585Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T12:11:17.0595995Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T12:11:17.0596429Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum 2025-12-04T12:11:17.0596770Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T12:11:17.0597101Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T12:11:17.0597453Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T12:11:17.0597817Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T12:11:17.0598164Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged 2025-12-04T12:11:17.0598537Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:11:17.0598760Z 2025-12-04T12:11:17.0598885Z Running distributed tests for the nccl backend with file init_method 2025-12-04T12:11:17.0599064Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:11:17.0599493Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:11:17.057370] 2025-12-04T12:14:58.2338077Z 2025-12-04T12:14:58.2339134Z distributed/test_distributed_spawn 1/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.7_44c21d2f43a0ec91_.log 2025-12-04T12:14:58.2353306Z Running 38 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:14:58.2362710Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel 2025-12-04T12:14:58.2363208Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook 2025-12-04T12:14:58.2363741Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex 2025-12-04T12:14:58.2364166Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda 2025-12-04T12:14:58.2364585Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T12:14:58.2365022Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T12:14:58.2365441Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda 2025-12-04T12:14:58.2365857Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T12:14:58.2366289Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T12:14:58.2366695Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group 2025-12-04T12:14:58.2367128Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T12:14:58.2367567Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T12:14:58.2367992Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward 2025-12-04T12:14:58.2368534Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T12:14:58.2369015Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T12:14:58.2369460Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization 2025-12-04T12:14:58.2369944Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce 2025-12-04T12:14:58.2370374Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD 2025-12-04T12:14:58.2370802Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence 2025-12-04T12:14:58.2371280Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params 2025-12-04T12:14:58.2371818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none 2025-12-04T12:14:58.2372418Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none 2025-12-04T12:14:58.2372925Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval 2025-12-04T12:14:58.2373363Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception 2025-12-04T12:14:58.2373753Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T12:14:58.2374172Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception 2025-12-04T12:14:58.2374570Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static 2025-12-04T12:14:58.2374943Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T12:14:58.2375319Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T12:14:58.2375720Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T12:14:58.2376123Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T12:14:58.2376505Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum 2025-12-04T12:14:58.2376839Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T12:14:58.2377166Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T12:14:58.2377509Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T12:14:58.2377865Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T12:14:58.2378208Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged 2025-12-04T12:14:58.2378572Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:14:58.2378793Z 2025-12-04T12:14:58.2378917Z Running distributed tests for the gloo backend with env init_method 2025-12-04T12:14:58.2379090Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:14:58.2379518Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:14:58.234959] 2025-12-04T12:18:02.6547626Z 2025-12-04T12:18:02.6548799Z distributed/test_distributed_spawn 1/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.7_eb7d6b833d5733aa_.log 2025-12-04T12:18:02.6563062Z Running 38 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:18:02.6572303Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel 2025-12-04T12:18:02.6572804Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook 2025-12-04T12:18:02.6573300Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex 2025-12-04T12:18:02.6573781Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda 2025-12-04T12:18:02.6574210Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T12:18:02.6574648Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T12:18:02.6575070Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda 2025-12-04T12:18:02.6575492Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T12:18:02.6575927Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T12:18:02.6576339Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group 2025-12-04T12:18:02.6576781Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T12:18:02.6577222Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T12:18:02.6577654Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward 2025-12-04T12:18:02.6578133Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T12:18:02.6578633Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T12:18:02.6579081Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization 2025-12-04T12:18:02.6579525Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce 2025-12-04T12:18:02.6580009Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD 2025-12-04T12:18:02.6580447Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence 2025-12-04T12:18:02.6580938Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params 2025-12-04T12:18:02.6581538Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none 2025-12-04T12:18:02.6582116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none 2025-12-04T12:18:02.6582608Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval 2025-12-04T12:18:02.6582980Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception 2025-12-04T12:18:02.6583375Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T12:18:02.6583797Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception 2025-12-04T12:18:02.6584207Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static 2025-12-04T12:18:02.6584617Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T12:18:02.6584994Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T12:18:02.6585382Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T12:18:02.6585789Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T12:18:02.6586170Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum 2025-12-04T12:18:02.6586511Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T12:18:02.6586840Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T12:18:02.6587184Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T12:18:02.6587550Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T12:18:02.6587894Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged 2025-12-04T12:18:02.6588269Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:18:02.6588495Z 2025-12-04T12:18:02.6588583Z Running distributed tests for the gloo backend with file init_method 2025-12-04T12:18:02.6588762Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:18:02.6589190Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:18:02.656056] 2025-12-04T12:21:06.5918373Z 2025-12-04T12:21:06.5919340Z distributed/test_distributed_spawn 1/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.7_fdddebdc32b908ed_.log 2025-12-04T12:21:06.5928521Z Running 38 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:21:06.5936237Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel 2025-12-04T12:21:06.5936700Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_with_then_hook 2025-12-04T12:21:06.5937216Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda_complex 2025-12-04T12:21:06.5937624Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_v_cuda 2025-12-04T12:21:06.5938028Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T12:21:06.5938447Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_complex 2025-12-04T12:21:06.5938850Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_cuda 2025-12-04T12:21:06.5939250Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T12:21:06.5939659Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group_cuda 2025-12-04T12:21:06.5940103Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group 2025-12-04T12:21:06.5940563Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_ring_exchange_nccl 2025-12-04T12:21:06.5940986Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T12:21:06.5941392Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward 2025-12-04T12:21:06.5941850Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_grad_as_bucket_view_false 2025-12-04T12:21:06.5942253Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_buffer_hook_allreduce 2025-12-04T12:21:06.5942631Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device_mesh_initialization 2025-12-04T12:21:06.5943004Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce 2025-12-04T12:21:06.5943366Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD 2025-12-04T12:21:06.5943727Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_join_model_equivalence 2025-12-04T12:21:06.5944126Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_err_ignore_params 2025-12-04T12:21:06.5944577Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_grad_as_bucket_view_no_set_grad_none 2025-12-04T12:21:06.5945057Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_no_grad_as_bucket_view_set_grad_to_none 2025-12-04T12:21:06.5945482Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval 2025-12-04T12:21:06.5945869Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_exception 2025-12-04T12:21:06.5946259Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T12:21:06.5946672Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_unused_params_rebuild_buckets_exception 2025-12-04T12:21:06.5947080Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_detect_ddp_is_actually_static 2025-12-04T12:21:06.5947502Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_different_graph_across_ranks 2025-12-04T12:21:06.5947881Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T12:21:06.5948267Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_wait_all_ranks 2025-12-04T12:21:06.5948665Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_grad_is_view 2025-12-04T12:21:06.5949046Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_sum 2025-12-04T12:21:06.5949378Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T12:21:06.5949739Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T12:21:06.5950116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T12:21:06.5950476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T12:21:06.5950817Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sync_bn_logged 2025-12-04T12:21:06.5951182Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_with_logger 2025-12-04T12:21:06.5951398Z 2025-12-04T12:21:06.5951531Z Finished distributed/test_distributed_spawn 1/7 ... [2025-12-04 12:21:06.592555][2236009.067579281], took 13.60min 2025-12-04T12:21:06.5951978Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:21:06.5956581Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:21:06.5956815Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T12:21:06.5956997Z Uploading artifacts took 0.00 seconds 2025-12-04T12:21:06.5959998Z Running distributed/test_distributed_spawn 4/7 ... [2025-12-04 12:21:06.595907][2236009.070937629] 2025-12-04T12:21:06.5961155Z MPI not available -- MPI backend tests will be skipped 2025-12-04T12:21:06.5961854Z Running distributed tests for the test backend with env init_method 2025-12-04T12:21:06.5962583Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:21:06.5964571Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:21:06.596311] 2025-12-04T12:21:08.6189459Z 2025-12-04T12:21:08.6190344Z distributed/test_distributed_spawn 4/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.7_ea935e42f87edee9_.log 2025-12-04T12:21:08.6190714Z Running 0 items in this shard: 2025-12-04T12:21:08.6190796Z 2025-12-04T12:21:08.6195174Z Running distributed tests for the test backend with file init_method 2025-12-04T12:21:08.6195982Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:21:08.6199022Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:21:08.619719] 2025-12-04T12:21:10.5904697Z 2025-12-04T12:21:10.5906212Z distributed/test_distributed_spawn 4/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.7_ce17b463187b57d4_.log 2025-12-04T12:21:10.5906931Z Running 0 items in this shard: 2025-12-04T12:21:10.5907102Z 2025-12-04T12:21:10.5907782Z Running distributed tests for the nccl backend with env init_method 2025-12-04T12:21:10.5908598Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:21:10.5913732Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:21:10.590946] 2025-12-04T12:24:15.9888197Z 2025-12-04T12:24:15.9889341Z distributed/test_distributed_spawn 4/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.7_e256b6d38c9977d6_.log 2025-12-04T12:24:15.9904452Z Running 39 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:24:15.9913116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T12:24:15.9913680Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T12:24:15.9914196Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T12:24:15.9914684Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group 2025-12-04T12:24:15.9915128Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple 2025-12-04T12:24:15.9915568Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty 2025-12-04T12:24:15.9916020Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T12:24:15.9916455Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min 2025-12-04T12:24:15.9916865Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T12:24:15.9917269Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T12:24:15.9917660Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all 2025-12-04T12:24:15.9918047Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex 2025-12-04T12:24:15.9918457Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T12:24:15.9918889Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T12:24:15.9919357Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T12:24:15.9919874Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T12:24:15.9920352Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda 2025-12-04T12:24:15.9920842Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T12:24:15.9921280Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T12:24:15.9921710Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T12:24:15.9922069Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T12:24:15.9922422Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T12:24:15.9922780Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T12:24:15.9923121Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T12:24:15.9923456Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T12:24:15.9923828Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T12:24:15.9924250Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T12:24:15.9924621Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group 2025-12-04T12:24:15.9924948Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T12:24:15.9925266Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T12:24:15.9925601Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T12:24:15.9925954Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T12:24:15.9926349Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks 2025-12-04T12:24:15.9926743Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T12:24:15.9927114Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T12:24:15.9927472Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T12:24:15.9927796Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T12:24:15.9928120Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T12:24:15.9928480Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:24:15.9928693Z 2025-12-04T12:24:15.9928790Z Running distributed tests for the nccl backend with file init_method 2025-12-04T12:24:15.9928963Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:24:15.9929390Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:24:15.989833] 2025-12-04T12:27:20.4184789Z 2025-12-04T12:27:20.4185948Z distributed/test_distributed_spawn 4/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.7_21d98e678f106fba_.log 2025-12-04T12:27:20.4201165Z Running 39 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:27:20.4210532Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T12:27:20.4211097Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T12:27:20.4211656Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T12:27:20.4212177Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group 2025-12-04T12:27:20.4212651Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple 2025-12-04T12:27:20.4213132Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty 2025-12-04T12:27:20.4213613Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T12:27:20.4214084Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min 2025-12-04T12:27:20.4214572Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T12:27:20.4215003Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T12:27:20.4215423Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all 2025-12-04T12:27:20.4215840Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex 2025-12-04T12:27:20.4216271Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T12:27:20.4216727Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T12:27:20.4217231Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T12:27:20.4217743Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T12:27:20.4218260Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda 2025-12-04T12:27:20.4218779Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T12:27:20.4219252Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T12:27:20.4219596Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T12:27:20.4220063Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T12:27:20.4220422Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T12:27:20.4220787Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T12:27:20.4221133Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T12:27:20.4221472Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T12:27:20.4221843Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T12:27:20.4222272Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T12:27:20.4222648Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group 2025-12-04T12:27:20.4222979Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T12:27:20.4223300Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T12:27:20.4223636Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T12:27:20.4223990Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T12:27:20.4224384Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks 2025-12-04T12:27:20.4224785Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T12:27:20.4225195Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T12:27:20.4225556Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T12:27:20.4225887Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T12:27:20.4226210Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T12:27:20.4226571Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:27:20.4226786Z 2025-12-04T12:27:20.4226877Z Running distributed tests for the gloo backend with env init_method 2025-12-04T12:27:20.4227049Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:27:20.4227477Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:27:20.419652] 2025-12-04T12:30:12.3633349Z 2025-12-04T12:30:12.3634409Z distributed/test_distributed_spawn 4/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.7_fa2eebc2f0a1face_.log 2025-12-04T12:30:12.3649106Z Running 39 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:30:12.3658171Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T12:30:12.3658792Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T12:30:12.3659477Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T12:30:12.3660139Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group 2025-12-04T12:30:12.3660586Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple 2025-12-04T12:30:12.3661033Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty 2025-12-04T12:30:12.3661488Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T12:30:12.3661924Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min 2025-12-04T12:30:12.3662399Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T12:30:12.3662810Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T12:30:12.3663203Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all 2025-12-04T12:30:12.3663597Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex 2025-12-04T12:30:12.3664006Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T12:30:12.3664433Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T12:30:12.3664905Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T12:30:12.3665389Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T12:30:12.3665907Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda 2025-12-04T12:30:12.3666401Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T12:30:12.3666838Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T12:30:12.3667244Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T12:30:12.3667659Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T12:30:12.3668075Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T12:30:12.3668499Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T12:30:12.3668897Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T12:30:12.3669247Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T12:30:12.3669614Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T12:30:12.3670039Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T12:30:12.3670407Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group 2025-12-04T12:30:12.3670735Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T12:30:12.3671053Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T12:30:12.3671386Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T12:30:12.3671737Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T12:30:12.3672126Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks 2025-12-04T12:30:12.3672519Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T12:30:12.3672934Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T12:30:12.3673294Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T12:30:12.3673616Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T12:30:12.3673939Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T12:30:12.3674298Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:30:12.3674508Z 2025-12-04T12:30:12.3674598Z Running distributed tests for the gloo backend with file init_method 2025-12-04T12:30:12.3674768Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:30:12.3675194Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:12.364516] 2025-12-04T12:33:05.4788085Z 2025-12-04T12:33:05.4788862Z distributed/test_distributed_spawn 4/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.7_71dda5efe4a5bacf_.log 2025-12-04T12:33:05.4799759Z Running 39 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:33:05.4807808Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T12:33:05.4808358Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T12:33:05.4808943Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_with_amp_and_grad_is_view 2025-12-04T12:33:05.4809505Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_full_group 2025-12-04T12:33:05.4810064Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_simple 2025-12-04T12:33:05.4810568Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_with_empty 2025-12-04T12:33:05.4811082Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_stack_tensor_cuda 2025-12-04T12:33:05.4811485Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_min 2025-12-04T12:33:05.4811851Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_sum 2025-12-04T12:33:05.4812218Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T12:33:05.4812571Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all 2025-12-04T12:33:05.4812922Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_complex 2025-12-04T12:33:05.4813287Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T12:33:05.4813671Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T12:33:05.4814089Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T12:33:05.4814565Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_group 2025-12-04T12:33:05.4814997Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group_cuda 2025-12-04T12:33:05.4815438Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T12:33:05.4815829Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T12:33:05.4816210Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T12:33:05.4816583Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T12:33:05.4816958Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo 2025-12-04T12:33:05.4817339Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T12:33:05.4817734Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_device 2025-12-04T12:33:05.4818090Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_gpu 2025-12-04T12:33:05.4818480Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_model_diff_shape_across_ranks 2025-12-04T12:33:05.4818889Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_new_tensor_in_fwd_static_graph 2025-12-04T12:33:05.4819275Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_full_group 2025-12-04T12:33:05.4819622Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather 2025-12-04T12:33:05.4819996Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T12:33:05.4820346Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_group 2025-12-04T12:33:05.4820699Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T12:33:05.4821092Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang_wait_all_ranks 2025-12-04T12:33:05.4821490Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_reduce 2025-12-04T12:33:05.4821864Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_with_group_param 2025-12-04T12:33:05.4822225Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T12:33:05.4822550Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter 2025-12-04T12:33:05.4822875Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_complex 2025-12-04T12:33:05.4823234Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T12:33:05.4823447Z 2025-12-04T12:33:05.4823580Z Finished distributed/test_distributed_spawn 4/7 ... [2025-12-04 12:33:05.479642][2236727.954667306], took 11.98min 2025-12-04T12:33:05.4824054Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:33:05.4827872Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:33:05.4831524Z Running distributed/test_distributed_spawn 7/7 ... [2025-12-04 12:33:05.483066][2236727.958096222] 2025-12-04T12:33:05.4831917Z MPI not available -- MPI backend tests will be skipped 2025-12-04T12:33:05.4833392Z Running distributed tests for the test backend with env init_method 2025-12-04T12:33:05.4833976Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:33:05.4835825Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:05.483453] 2025-12-04T12:33:07.4898566Z 2025-12-04T12:33:07.4899916Z distributed/test_distributed_spawn 7/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.7_d1ca1e5163536222_.log 2025-12-04T12:33:07.4901123Z Running 0 items in this shard: 2025-12-04T12:33:07.4901314Z 2025-12-04T12:33:07.4904531Z Running distributed tests for the test backend with file init_method 2025-12-04T12:33:07.4905009Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:33:07.4907933Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:07.490627] 2025-12-04T12:33:09.5132891Z 2025-12-04T12:33:09.5133914Z distributed/test_distributed_spawn 7/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.7_92ca3f9f24c090e3_.log 2025-12-04T12:33:09.5134724Z Running 0 items in this shard: 2025-12-04T12:33:09.5134906Z 2025-12-04T12:33:09.5139480Z Running distributed tests for the nccl backend with env init_method 2025-12-04T12:33:09.5140773Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:33:09.5143454Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:09.514197] 2025-12-04T12:36:22.4651564Z 2025-12-04T12:36:22.4652356Z distributed/test_distributed_spawn 7/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.7_7fa9032f289fcabf_.log 2025-12-04T12:36:22.4661324Z Running 34 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:36:22.4668641Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T12:36:22.4669146Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm 2025-12-04T12:36:22.4669614Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T12:36:22.4670130Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T12:36:22.4676000Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T12:36:22.4676439Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T12:36:22.4676799Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T12:36:22.4677165Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T12:36:22.4677561Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T12:36:22.4677941Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T12:36:22.4678410Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum 2025-12-04T12:36:22.4678769Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T12:36:22.4679121Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err 2025-12-04T12:36:22.4679527Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T12:36:22.4680019Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging 2025-12-04T12:36:22.4680386Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T12:36:22.4680780Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T12:36:22.4681195Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T12:36:22.4681552Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T12:36:22.4681905Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend 2025-12-04T12:36:22.4682246Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T12:36:22.4682624Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync 2025-12-04T12:36:22.4682996Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T12:36:22.4683379Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T12:36:22.4683776Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout 2025-12-04T12:36:22.4684160Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast 2025-12-04T12:36:22.4684531Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration 2025-12-04T12:36:22.4684932Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T12:36:22.4685342Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed 2025-12-04T12:36:22.4685759Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd 2025-12-04T12:36:22.4686166Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T12:36:22.4686525Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda 2025-12-04T12:36:22.4686878Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum 2025-12-04T12:36:22.4687258Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:36:22.4687483Z 2025-12-04T12:36:22.4687571Z Running distributed tests for the nccl backend with file init_method 2025-12-04T12:36:22.4687743Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:36:22.4688205Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:22.466459] 2025-12-04T12:39:35.5661759Z 2025-12-04T12:39:35.5662664Z distributed/test_distributed_spawn 7/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.7_32b635118fdac716_.log 2025-12-04T12:39:35.5673816Z Running 34 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:39:35.5681960Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T12:39:35.5682536Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm 2025-12-04T12:39:35.5683067Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T12:39:35.5683620Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T12:39:35.5684192Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T12:39:35.5684798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T12:39:35.5685288Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T12:39:35.5685799Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T12:39:35.5686340Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T12:39:35.5686789Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T12:39:35.5687196Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum 2025-12-04T12:39:35.5687585Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T12:39:35.5687979Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err 2025-12-04T12:39:35.5688424Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T12:39:35.5688863Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging 2025-12-04T12:39:35.5689262Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T12:39:35.5689685Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T12:39:35.5690130Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T12:39:35.5690522Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T12:39:35.5690905Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend 2025-12-04T12:39:35.5691280Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T12:39:35.5691690Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync 2025-12-04T12:39:35.5692095Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T12:39:35.5692538Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T12:39:35.5692967Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout 2025-12-04T12:39:35.5693388Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast 2025-12-04T12:39:35.5693789Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration 2025-12-04T12:39:35.5694231Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T12:39:35.5694689Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed 2025-12-04T12:39:35.5695140Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd 2025-12-04T12:39:35.5695608Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T12:39:35.5696000Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda 2025-12-04T12:39:35.5696373Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum 2025-12-04T12:39:35.5696752Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:39:35.5696974Z 2025-12-04T12:39:35.5697065Z Running distributed tests for the gloo backend with env init_method 2025-12-04T12:39:35.5697234Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:39:35.5697660Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:39:35.567467] 2025-12-04T12:42:35.1731077Z 2025-12-04T12:42:35.1732413Z distributed/test_distributed_spawn 7/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.7_a20afa8964723c8a_.log 2025-12-04T12:42:35.1746600Z Running 34 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:42:35.1755868Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T12:42:35.1756446Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm 2025-12-04T12:42:35.1756997Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T12:42:35.1757547Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T12:42:35.1758124Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T12:42:35.1758672Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T12:42:35.1759166Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T12:42:35.1759677Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T12:42:35.1760274Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T12:42:35.1760811Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T12:42:35.1761334Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum 2025-12-04T12:42:35.1761889Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T12:42:35.1762385Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err 2025-12-04T12:42:35.1762902Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T12:42:35.1763345Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging 2025-12-04T12:42:35.1763750Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T12:42:35.1764177Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T12:42:35.1764587Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T12:42:35.1765018Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T12:42:35.1765404Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend 2025-12-04T12:42:35.1765779Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T12:42:35.1766194Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync 2025-12-04T12:42:35.1766604Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T12:42:35.1767006Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T12:42:35.1767439Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout 2025-12-04T12:42:35.1767860Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast 2025-12-04T12:42:35.1768266Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration 2025-12-04T12:42:35.1768706Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T12:42:35.1769152Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed 2025-12-04T12:42:35.1769603Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd 2025-12-04T12:42:35.1770101Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T12:42:35.1770495Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda 2025-12-04T12:42:35.1770877Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum 2025-12-04T12:42:35.1771291Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:42:35.1771536Z 2025-12-04T12:42:35.1771635Z Running distributed tests for the gloo backend with file init_method 2025-12-04T12:42:35.1771822Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:42:35.1772287Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:42:35.174458] 2025-12-04T12:45:32.9737139Z 2025-12-04T12:45:32.9738145Z distributed/test_distributed_spawn 7/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.7_527ac0300c18080a_.log 2025-12-04T12:45:32.9749546Z Running 34 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:45:32.9758955Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU_grad_is_view 2025-12-04T12:45:32.9759495Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm 2025-12-04T12:45:32.9760036Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T12:45:32.9760540Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T12:45:32.9761067Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T12:45:32.9761566Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_coalesced_group 2025-12-04T12:45:32.9762018Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_group 2025-12-04T12:45:32.9762541Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T12:45:32.9763036Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T12:45:32.9763514Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_sum 2025-12-04T12:45:32.9763985Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_sum 2025-12-04T12:45:32.9764435Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_group_cuda 2025-12-04T12:45:32.9764888Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_err 2025-12-04T12:45:32.9765408Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T12:45:32.9765923Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_comm_hook_logging 2025-12-04T12:45:32.9766408Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_post_localSGD 2025-12-04T12:45:32.9766903Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_multiple_nested_unused_params_error 2025-12-04T12:45:32.9767380Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_namedtuple 2025-12-04T12:45:32.9767785Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_static_graph_nested_types 2025-12-04T12:45:32.9768144Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_backend 2025-12-04T12:45:32.9768494Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T12:45:32.9768878Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_grads_same_across_ranks_with_no_sync 2025-12-04T12:45:32.9769257Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T12:45:32.9769627Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang 2025-12-04T12:45:32.9770113Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_rank_0_timeout 2025-12-04T12:45:32.9770507Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_broadcast 2025-12-04T12:45:32.9770886Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration 2025-12-04T12:45:32.9771294Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_by_enumeration_negative_input_rank 2025-12-04T12:45:32.9771708Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_overlap_not_allowed 2025-12-04T12:45:32.9772129Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd 2025-12-04T12:45:32.9772539Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T12:45:32.9772933Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_scatter_v_cuda 2025-12-04T12:45:32.9773286Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum 2025-12-04T12:45:32.9773669Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_undefined_grad_parity_unused_parameters 2025-12-04T12:45:32.9773897Z 2025-12-04T12:45:32.9774030Z Finished distributed/test_distributed_spawn 7/7 ... [2025-12-04 12:45:32.974501][2237475.449525523], took 12.46min 2025-12-04T12:45:32.9774471Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:45:32.9778333Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:45:32.9778558Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T12:45:32.9778740Z Uploading artifacts took 0.00 seconds 2025-12-04T12:45:32.9783825Z Running distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 12:45:32.978237][2237475.453250186] 2025-12-04T12:45:32.9784044Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:45:32.9788348Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_hsdp_dtensor_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:45:32.978715] 2025-12-04T12:50:37.6206889Z 2025-12-04T12:50:37.6207564Z PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_e5c237ac1f49bda1_.log) 2025-12-04T12:50:37.6208127Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-99ec26399339c1a4.xml 2025-12-04T12:50:37.6208468Z ============================= test session starts ============================== 2025-12-04T12:50:37.6208691Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6208921Z cachedir: .pytest_cache 2025-12-04T12:50:37.6209152Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6209396Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6209518Z configfile: pytest.ini 2025-12-04T12:50:37.6209826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6210073Z collecting ... collected 8 items 2025-12-04T12:50:37.6210220Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:50:37.6212805Z Running 8 items in this shard: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.6214839Z 2025-12-04T12:50:37.6215228Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 12:45:34.683000 396705 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 396774 2025-12-04T12:50:37.6215802Z I1204 12:45:34.684000 396705 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 396775 2025-12-04T12:50:37.6216147Z I1204 12:45:34.685000 396705 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 396776 2025-12-04T12:50:37.6216494Z I1204 12:45:34.685000 396705 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 396777 2025-12-04T12:50:37.6217399Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6218181Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6218977Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6220036Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6220774Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6221519Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6222307Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6223058Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6224414Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6225877Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6227323Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6228759Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6230371Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6231793Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6233262Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6234677Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6234974Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6235342Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6235821Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6236288Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6236760Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6237217Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6237647Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6238099Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6238549Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6238995Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6239445Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6239965Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6240413Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6240889Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6241609Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6242314Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6242655Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6243313Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6243888Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6244247Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6244648Z E1204 12:45:43.148000 396775 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6245050Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6245373Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6245847Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6246311Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6246793Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6247238Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6247663Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6248110Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6248557Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6249002Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6249452Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6249946Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6250384Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6250832Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6251581Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1254096896 and is now 3076521984. 2025-12-04T12:50:37.6252248Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6252582Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6253233Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6253799Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6254150Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6254588Z E1204 12:45:43.155000 396776 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6254917Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6255239Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6255709Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6256173Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6256637Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6257069Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6257490Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6257940Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6258388Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6258838Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6259285Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6259774Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6260211Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6260661Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6261387Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3235905536. 2025-12-04T12:50:37.6262048Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6262383Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6263040Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6263606Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6263995Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6264393Z E1204 12:45:43.159000 396774 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6264721Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6265043Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6265512Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6265976Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6266446Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6266878Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6267303Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6267749Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6268198Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6268644Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6269091Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6269527Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6270001Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6270574Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6271277Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6271939Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6272273Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6272924Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6273519Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6273866Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6274262Z E1204 12:45:43.160000 396777 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6274500Z FAILED [9.7151s] [ 12%] 2025-12-04T12:50:37.6274568Z 2025-12-04T12:50:37.6274628Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6274873Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6275107Z Traceback (most recent call last): 2025-12-04T12:50:37.6275357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6275604Z self._join_processes(fn) 2025-12-04T12:50:37.6275849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6276114Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6276382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6276643Z raise RuntimeError(error) 2025-12-04T12:50:37.6276796Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6276959Z Traceback (most recent call last): 2025-12-04T12:50:37.6277199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6277440Z getattr(self, test_name)() 2025-12-04T12:50:37.6277672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6277907Z fn() 2025-12-04T12:50:37.6278108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6278340Z method(*args, **kwargs) 2025-12-04T12:50:37.6278559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6278790Z method(*args, **kwargs) 2025-12-04T12:50:37.6279008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6279235Z with policy(): 2025-12-04T12:50:37.6279448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6279784Z raise RuntimeError(msg) 2025-12-04T12:50:37.6280255Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6280695Z 2025-12-04T12:50:37.6280772Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6281190Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6281531Z 2025-12-04T12:50:37.6281621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6281745Z 2025-12-04T12:50:37.6281747Z 2025-12-04T12:50:37.6281830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6282033Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6282461Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-99ec26399339c1a4.xml - 2025-12-04T12:50:37.6282828Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6283253Z FAILED [9.7151s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6283652Z Traceback (most recent call last): 2025-12-04T12:50:37.6283899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6284144Z getattr(self, test_name)() 2025-12-04T12:50:37.6284380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6284615Z fn() 2025-12-04T12:50:37.6284817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6285048Z method(*args, **kwargs) 2025-12-04T12:50:37.6285271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6285500Z method(*args, **kwargs) 2025-12-04T12:50:37.6285717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6285996Z with policy(): 2025-12-04T12:50:37.6286207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6286438Z raise RuntimeError(msg) 2025-12-04T12:50:37.6286911Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6287349Z 2025-12-04T12:50:37.6287423Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6287842Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6288185Z 2025-12-04T12:50:37.6288272Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6288461Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6288657Z ============================== 1 failed in 9.73s =============================== 2025-12-04T12:50:37.6288793Z Got exit code 1 2025-12-04T12:50:37.6288892Z Retrying single test... 2025-12-04T12:50:37.6289185Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7e4ff3d353e60738.xml 2025-12-04T12:50:37.6289503Z ============================= test session starts ============================== 2025-12-04T12:50:37.6289775Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6289964Z cachedir: .pytest_cache 2025-12-04T12:50:37.6290187Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6290426Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6290545Z configfile: pytest.ini 2025-12-04T12:50:37.6290777Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6291081Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6291491Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6291866Z Running 1 items in this shard 2025-12-04T12:50:37.6291942Z 2025-12-04T12:50:37.6292317Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 12:45:46.778000 397175 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 397244 2025-12-04T12:50:37.6292887Z I1204 12:45:46.778000 397175 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 397245 2025-12-04T12:50:37.6293261Z I1204 12:45:46.779000 397175 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 397246 2025-12-04T12:50:37.6293614Z I1204 12:45:46.780000 397175 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 397247 2025-12-04T12:50:37.6294498Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6295257Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6296008Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6308486Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6309344Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6310140Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6311124Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6311874Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6313228Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6314683Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6316110Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6317528Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6318953Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6320409Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6321893Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6323308Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6323637Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6323971Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6324452Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6324923Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6325396Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6325833Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6326266Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6326722Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6327234Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6327684Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6328137Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6328585Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6329030Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6329485Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6330278Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3242196992. 2025-12-04T12:50:37.6330947Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6331290Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6331950Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6332522Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6332877Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6333315Z E1204 12:45:55.183000 397244 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6333645Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6333970Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6334444Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6334914Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6335382Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6335817Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6336242Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6336694Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6337145Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6337596Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6338047Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6338488Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6338929Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6339380Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6340157Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6340823Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6341168Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6341825Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6342395Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6342783Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6343188Z E1204 12:45:55.191000 397245 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6343518Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6343845Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6344373Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6344843Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6345311Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6345745Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6346171Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6346622Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6347078Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6347528Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6347978Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6348421Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6348864Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6349345Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6350133Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1251999744 and is now 3076521984. 2025-12-04T12:50:37.6350797Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6351136Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6351794Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6352399Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6352751Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6353152Z E1204 12:45:55.196000 397247 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6353482Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6353806Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6354282Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6354756Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6355221Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6355656Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6356082Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6356533Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6356987Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6357438Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6357891Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6358329Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6358815Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6359269Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6360012Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6360676Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6361013Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6361672Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6362271Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6362623Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6363027Z E1204 12:45:55.250000 397246 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6363268Z FAILED [9.6158s] [100%] 2025-12-04T12:50:37.6363336Z 2025-12-04T12:50:37.6363401Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6363652Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6363895Z Traceback (most recent call last): 2025-12-04T12:50:37.6364145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6364395Z self._join_processes(fn) 2025-12-04T12:50:37.6364645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6364912Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6365186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6365450Z raise RuntimeError(error) 2025-12-04T12:50:37.6365609Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6365775Z Traceback (most recent call last): 2025-12-04T12:50:37.6366022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6366268Z getattr(self, test_name)() 2025-12-04T12:50:37.6366503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6366738Z fn() 2025-12-04T12:50:37.6366943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6367177Z method(*args, **kwargs) 2025-12-04T12:50:37.6367403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6367635Z method(*args, **kwargs) 2025-12-04T12:50:37.6367855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6368086Z with policy(): 2025-12-04T12:50:37.6368339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6368577Z raise RuntimeError(msg) 2025-12-04T12:50:37.6369051Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3242196992. 2025-12-04T12:50:37.6369491Z 2025-12-04T12:50:37.6369569Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6370033Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6370375Z 2025-12-04T12:50:37.6370465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6370625Z 2025-12-04T12:50:37.6370627Z 2025-12-04T12:50:37.6370708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6370911Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6371309Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7e4ff3d353e60738.xml - 2025-12-04T12:50:37.6371678Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6372101Z FAILED [9.6158s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6372495Z Traceback (most recent call last): 2025-12-04T12:50:37.6372745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6372992Z getattr(self, test_name)() 2025-12-04T12:50:37.6373224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6373457Z fn() 2025-12-04T12:50:37.6373661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6373892Z method(*args, **kwargs) 2025-12-04T12:50:37.6374110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6374338Z method(*args, **kwargs) 2025-12-04T12:50:37.6374554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6374779Z with policy(): 2025-12-04T12:50:37.6374992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6375226Z raise RuntimeError(msg) 2025-12-04T12:50:37.6375695Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3242196992. 2025-12-04T12:50:37.6376126Z 2025-12-04T12:50:37.6376204Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6376618Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6376960Z 2025-12-04T12:50:37.6377077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6377268Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6377435Z ======================= 1 failed, 7 deselected in 9.63s ======================== 2025-12-04T12:50:37.6377576Z Got exit code 1 2025-12-04T12:50:37.6377675Z Retrying single test... 2025-12-04T12:50:37.6377969Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-45c4083bb04a1124.xml 2025-12-04T12:50:37.6378289Z ============================= test session starts ============================== 2025-12-04T12:50:37.6378500Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6378687Z cachedir: .pytest_cache 2025-12-04T12:50:37.6378910Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6379151Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6379299Z configfile: pytest.ini 2025-12-04T12:50:37.6379528Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6379852Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6380255Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6380627Z Running 1 items in this shard 2025-12-04T12:50:37.6380701Z 2025-12-04T12:50:37.6381075Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 12:45:58.868000 397645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 397714 2025-12-04T12:50:37.6381642Z I1204 12:45:58.869000 397645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 397715 2025-12-04T12:50:37.6381988Z I1204 12:45:58.870000 397645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 397716 2025-12-04T12:50:37.6382338Z I1204 12:45:58.870000 397645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 397717 2025-12-04T12:50:37.6383217Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6383969Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6384712Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6385456Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6386218Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6386961Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6387696Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6388432Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6389810Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6391278Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6392707Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6394128Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6395580Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6396993Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6398422Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6399894Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6400190Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6400521Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6401000Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6401465Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6401935Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6402367Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6402791Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6403245Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6403705Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6404157Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6404610Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6405047Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6405489Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6405941Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6406677Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6407342Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6407677Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6408327Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6408895Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6409280Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6409681Z E1204 12:46:07.762000 397717 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6410054Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6410379Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6410854Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6411323Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6411786Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6412217Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6412639Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6413089Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6413538Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6413987Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6414434Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6414869Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6415307Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6415785Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6416485Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3240099840. 2025-12-04T12:50:37.6417141Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6417473Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6418125Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6418719Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6419069Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6419466Z E1204 12:46:07.771000 397714 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6419826Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6420149Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6420620Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6421084Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6421548Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6421979Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6422402Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6422851Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6423300Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6423745Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6424195Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6424629Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6425099Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6425547Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6426244Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6426905Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6427237Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6427887Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6428488Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6428835Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6429234Z E1204 12:46:07.783000 397715 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6429559Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6429915Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6430392Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6430853Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6431315Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6431744Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6432167Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6432620Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6433068Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6433512Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6433960Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6434423Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6434860Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6435311Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6436007Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3076521984. 2025-12-04T12:50:37.6436665Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6437001Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6437684Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6438248Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6438595Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6438992Z E1204 12:46:07.785000 397716 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6439230Z FAILED [10.1139s] [100%] 2025-12-04T12:50:37.6439297Z 2025-12-04T12:50:37.6439358Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6439601Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6439869Z Traceback (most recent call last): 2025-12-04T12:50:37.6440113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6440356Z self._join_processes(fn) 2025-12-04T12:50:37.6440602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6440865Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6441135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6441395Z raise RuntimeError(error) 2025-12-04T12:50:37.6441548Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6441709Z Traceback (most recent call last): 2025-12-04T12:50:37.6441947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6442187Z getattr(self, test_name)() 2025-12-04T12:50:37.6442418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6442648Z fn() 2025-12-04T12:50:37.6442848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6443077Z method(*args, **kwargs) 2025-12-04T12:50:37.6443298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6443529Z method(*args, **kwargs) 2025-12-04T12:50:37.6443780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6444010Z with policy(): 2025-12-04T12:50:37.6444222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6444453Z raise RuntimeError(msg) 2025-12-04T12:50:37.6444923Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3240099840. 2025-12-04T12:50:37.6445360Z 2025-12-04T12:50:37.6445435Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6445852Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6446224Z 2025-12-04T12:50:37.6446312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6446437Z 2025-12-04T12:50:37.6446438Z 2025-12-04T12:50:37.6446515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6446717Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6447111Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-45c4083bb04a1124.xml - 2025-12-04T12:50:37.6447476Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6447893Z FAILED [10.1139s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6448289Z Traceback (most recent call last): 2025-12-04T12:50:37.6448533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6448775Z getattr(self, test_name)() 2025-12-04T12:50:37.6449007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6449238Z fn() 2025-12-04T12:50:37.6449440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6449668Z method(*args, **kwargs) 2025-12-04T12:50:37.6449930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6450157Z method(*args, **kwargs) 2025-12-04T12:50:37.6450374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6450600Z with policy(): 2025-12-04T12:50:37.6450810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6451038Z raise RuntimeError(msg) 2025-12-04T12:50:37.6451506Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3240099840. 2025-12-04T12:50:37.6451939Z 2025-12-04T12:50:37.6452016Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6452462Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6452803Z 2025-12-04T12:50:37.6452891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6453078Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6453243Z ======================= 1 failed, 7 deselected in 10.12s ======================= 2025-12-04T12:50:37.6453381Z Got exit code 1 2025-12-04T12:50:37.6453692Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6454107Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.6454501Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-33df3437092a25cb.xml 2025-12-04T12:50:37.6454857Z ============================= test session starts ============================== 2025-12-04T12:50:37.6455069Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6455257Z cachedir: .pytest_cache 2025-12-04T12:50:37.6455482Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6455721Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6455839Z configfile: pytest.ini 2025-12-04T12:50:37.6456063Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6456333Z collecting ... collected 8 items / 1 deselected / 7 selected 2025-12-04T12:50:37.6456492Z stepcurrent: skipping 1 already run items. 2025-12-04T12:50:37.6456622Z Running 7 items in this shard 2025-12-04T12:50:37.6456700Z 2025-12-04T12:50:37.6457079Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 12:46:11.385000 398115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 398184 2025-12-04T12:50:37.6457648Z I1204 12:46:11.386000 398115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 398185 2025-12-04T12:50:37.6457993Z I1204 12:46:11.387000 398115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 398186 2025-12-04T12:50:37.6458333Z I1204 12:46:11.387000 398115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 398187 2025-12-04T12:50:37.6459204Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6460005Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6460745Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6461488Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6462265Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6463007Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6463738Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6464481Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6465858Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6467278Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6468702Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6470152Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6471605Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6473017Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6474446Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6475888Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6476184Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6476512Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6476989Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6477455Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6477922Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6478356Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6478784Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6479236Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6479736Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6480183Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6480634Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6481070Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6481538Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6481990Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6482694Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320. 2025-12-04T12:50:37.6483354Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6483692Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6484344Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6484945Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6485296Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6485698Z E1204 12:46:19.747000 398184 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6486027Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6486352Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6486825Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6487289Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6487752Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6488184Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6488609Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6489058Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6489506Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6490003Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6490452Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6490890Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6491365Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6491819Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6492519Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312. 2025-12-04T12:50:37.6493179Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6493517Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6494196Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6494760Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6495109Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6495507Z E1204 12:46:19.844000 398186 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6495835Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6496160Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6496631Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6497096Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6497560Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6497996Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6498423Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6498870Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6499320Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6499812Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6500294Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6500730Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6501169Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6501616Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6502316Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1262485504 and is now 3053453312. 2025-12-04T12:50:37.6502972Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6503347Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6504002Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6504575Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6504929Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6505333Z E1204 12:46:19.861000 398187 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6505665Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6505992Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6506466Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6506931Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6507400Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6507837Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6508265Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6508716Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6509167Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6509617Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6510131Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6510576Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6511019Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6511471Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6512179Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3053453312. 2025-12-04T12:50:37.6512870Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6513209Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6513860Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6514423Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6514776Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6515176Z E1204 12:46:19.864000 398185 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6515410Z FAILED [9.6156s] [ 14%] 2025-12-04T12:50:37.6515475Z 2025-12-04T12:50:37.6515532Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6515774Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.6516006Z Traceback (most recent call last): 2025-12-04T12:50:37.6516256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6516504Z self._join_processes(fn) 2025-12-04T12:50:37.6516757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6517025Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6517292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6517550Z raise RuntimeError(error) 2025-12-04T12:50:37.6517700Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6517863Z Traceback (most recent call last): 2025-12-04T12:50:37.6518103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6518343Z getattr(self, test_name)() 2025-12-04T12:50:37.6518576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6518809Z fn() 2025-12-04T12:50:37.6519043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6519277Z method(*args, **kwargs) 2025-12-04T12:50:37.6519497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6519764Z method(*args, **kwargs) 2025-12-04T12:50:37.6519984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6520209Z with policy(): 2025-12-04T12:50:37.6520424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6520657Z raise RuntimeError(msg) 2025-12-04T12:50:37.6521129Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320. 2025-12-04T12:50:37.6521593Z 2025-12-04T12:50:37.6521669Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6522086Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6522430Z 2025-12-04T12:50:37.6522518Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6522642Z 2025-12-04T12:50:37.6522644Z 2025-12-04T12:50:37.6522722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6522924Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6523320Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-33df3437092a25cb.xml - 2025-12-04T12:50:37.6523689Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6524106Z FAILED [9.6156s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6524499Z Traceback (most recent call last): 2025-12-04T12:50:37.6524745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6524990Z getattr(self, test_name)() 2025-12-04T12:50:37.6525224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6525459Z fn() 2025-12-04T12:50:37.6525664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6525895Z method(*args, **kwargs) 2025-12-04T12:50:37.6526119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6526347Z method(*args, **kwargs) 2025-12-04T12:50:37.6526567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6526794Z with policy(): 2025-12-04T12:50:37.6527004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6527235Z raise RuntimeError(msg) 2025-12-04T12:50:37.6527736Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320. 2025-12-04T12:50:37.6528171Z 2025-12-04T12:50:37.6528251Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6528667Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6529008Z 2025-12-04T12:50:37.6529097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6529286Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6529452Z ======================= 1 failed, 1 deselected in 9.63s ======================== 2025-12-04T12:50:37.6529592Z Got exit code 1 2025-12-04T12:50:37.6529726Z Retrying single test... 2025-12-04T12:50:37.6530020Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e7cba4b7b2464d6f.xml 2025-12-04T12:50:37.6530383Z ============================= test session starts ============================== 2025-12-04T12:50:37.6530593Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6530779Z cachedir: .pytest_cache 2025-12-04T12:50:37.6531001Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6531243Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6531364Z configfile: pytest.ini 2025-12-04T12:50:37.6531595Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6531870Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6532272Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6532643Z Running 1 items in this shard 2025-12-04T12:50:37.6532718Z 2025-12-04T12:50:37.6533090Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 12:46:23.680000 398585 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 398654 2025-12-04T12:50:37.6533650Z I1204 12:46:23.680000 398585 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 398655 2025-12-04T12:50:37.6533992Z I1204 12:46:23.681000 398585 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 398656 2025-12-04T12:50:37.6534333Z I1204 12:46:23.681000 398585 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 398657 2025-12-04T12:50:37.6535204Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6535952Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6536720Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6537464Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6538199Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6538940Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6539671Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6540486Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6541824Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6543237Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6544655Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6546069Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6547516Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6548928Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6550387Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6551830Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6552124Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6552453Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6552929Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6553397Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6553862Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6554296Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6554724Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6555174Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6555624Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6556072Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6556521Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6556958Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6557431Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6557882Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6558584Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3238002688. 2025-12-04T12:50:37.6559245Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6559582Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6560305Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6560871Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6561222Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6561622Z E1204 12:46:32.003000 398657 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6561952Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6562278Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6562750Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6563215Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6563680Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6564112Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6564536Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6564985Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6565432Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6565882Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6566359Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6566798Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6567239Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6567689Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6568389Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3303014400. 2025-12-04T12:50:37.6569048Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6569422Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6570122Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6570685Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6571037Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6571438Z E1204 12:46:32.010000 398656 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6571767Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6572088Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6572561Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6573024Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6573490Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6573922Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6574347Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6574796Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6575244Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6575690Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6576168Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6576606Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6577047Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6577496Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6578192Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3214934016. 2025-12-04T12:50:37.6578881Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6579216Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6579915Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6580478Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6580829Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6581229Z E1204 12:46:32.549000 398654 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6581555Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6581878Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6582348Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6582811Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6583275Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6583707Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6584131Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6584577Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6585024Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6585500Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6585949Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6586386Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6586825Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6587274Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6587973Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3053453312. 2025-12-04T12:50:37.6588660Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6588994Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6589641Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6590247Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6590598Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6590997Z E1204 12:46:32.569000 398655 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6591234Z FAILED [9.7135s] [100%] 2025-12-04T12:50:37.6591305Z 2025-12-04T12:50:37.6591363Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6591607Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.6591841Z Traceback (most recent call last): 2025-12-04T12:50:37.6592086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6592331Z self._join_processes(fn) 2025-12-04T12:50:37.6592580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6592845Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6593111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6593370Z raise RuntimeError(error) 2025-12-04T12:50:37.6593522Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.6593685Z Traceback (most recent call last): 2025-12-04T12:50:37.6593924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6594166Z getattr(self, test_name)() 2025-12-04T12:50:37.6594430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6594666Z fn() 2025-12-04T12:50:37.6594869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6595102Z method(*args, **kwargs) 2025-12-04T12:50:37.6595326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6595556Z method(*args, **kwargs) 2025-12-04T12:50:37.6595776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6596004Z with policy(): 2025-12-04T12:50:37.6596216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6596449Z raise RuntimeError(msg) 2025-12-04T12:50:37.6596920Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3303014400. 2025-12-04T12:50:37.6597386Z 2025-12-04T12:50:37.6597462Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6597881Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6598219Z 2025-12-04T12:50:37.6598307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6598431Z 2025-12-04T12:50:37.6598432Z 2025-12-04T12:50:37.6598511Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6598710Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6599105Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e7cba4b7b2464d6f.xml - 2025-12-04T12:50:37.6599473Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6599927Z FAILED [9.7135s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.6600320Z Traceback (most recent call last): 2025-12-04T12:50:37.6600566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6600808Z getattr(self, test_name)() 2025-12-04T12:50:37.6601043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6601275Z fn() 2025-12-04T12:50:37.6601481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6601710Z method(*args, **kwargs) 2025-12-04T12:50:37.6601928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6602155Z method(*args, **kwargs) 2025-12-04T12:50:37.6602374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6602599Z with policy(): 2025-12-04T12:50:37.6602811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6603042Z raise RuntimeError(msg) 2025-12-04T12:50:37.6603551Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3303014400. 2025-12-04T12:50:37.6603983Z 2025-12-04T12:50:37.6604060Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6604474Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6604815Z 2025-12-04T12:50:37.6604901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6605089Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6605253Z ======================= 1 failed, 7 deselected in 9.72s ======================== 2025-12-04T12:50:37.6605392Z Got exit code 1 2025-12-04T12:50:37.6605492Z Retrying single test... 2025-12-04T12:50:37.6605782Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-245b018540739122.xml 2025-12-04T12:50:37.6606133Z ============================= test session starts ============================== 2025-12-04T12:50:37.6606342Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6606530Z cachedir: .pytest_cache 2025-12-04T12:50:37.6606753Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6606994Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6614254Z configfile: pytest.ini 2025-12-04T12:50:37.6614492Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6614771Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6615178Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6615553Z Running 1 items in this shard 2025-12-04T12:50:37.6615627Z 2025-12-04T12:50:37.6616001Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 12:46:36.071000 399055 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 399124 2025-12-04T12:50:37.6616563Z I1204 12:46:36.072000 399055 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 399125 2025-12-04T12:50:37.6616906Z I1204 12:46:36.072000 399055 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 399126 2025-12-04T12:50:37.6617248Z I1204 12:46:36.073000 399055 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 399127 2025-12-04T12:50:37.6618126Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6618881Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6619739Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6620489Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6621227Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6621969Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6622706Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6623486Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6624824Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6626239Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6627670Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6629078Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6630654Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6632067Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6633487Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T12:50:37.6634935Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.6635230Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6635562Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6636040Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6636506Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6636975Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6637409Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6637836Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6638291Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6638744Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6639191Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6639671Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6640151Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6640592Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6641045Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6641752Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312. 2025-12-04T12:50:37.6642417Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6642813Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6643468Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6644037Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6644390Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6644793Z E1204 12:46:44.466000 399126 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6645125Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6645449Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6645923Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6646390Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6646858Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6647290Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6647717Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6648167Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6648617Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6649065Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6649561Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6650261Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6650701Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6651153Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6651858Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1254096896 and is now 3053453312. 2025-12-04T12:50:37.6652549Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6652886Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6653537Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6654101Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6654453Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6654856Z E1204 12:46:44.479000 399127 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6655187Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6655513Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6655987Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6656453Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6656924Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6657359Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6657783Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6658232Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6658683Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6659165Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6659615Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6660096Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6660538Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6660991Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6661692Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3053453312. 2025-12-04T12:50:37.6662386Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6662721Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6663371Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6663941Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6664292Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6664693Z E1204 12:46:44.523000 399125 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6665022Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6665344Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6665815Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6666280Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6666748Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6667181Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6667605Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6668054Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6668532Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6668982Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6669436Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6669912Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6670351Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6673033Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6673766Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3212836864. 2025-12-04T12:50:37.6674426Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6674764Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6675419Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6675999Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6676348Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6676751Z E1204 12:46:44.549000 399124 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6676988Z FAILED [9.7171s] [100%] 2025-12-04T12:50:37.6677057Z 2025-12-04T12:50:37.6677123Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6677371Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.6677608Z Traceback (most recent call last): 2025-12-04T12:50:37.6677861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6678111Z self._join_processes(fn) 2025-12-04T12:50:37.6678361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6678628Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6678897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6679158Z raise RuntimeError(error) 2025-12-04T12:50:37.6679316Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.6679479Z Traceback (most recent call last): 2025-12-04T12:50:37.6679755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6680033Z getattr(self, test_name)() 2025-12-04T12:50:37.6680267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6680502Z fn() 2025-12-04T12:50:37.6680705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6680937Z method(*args, **kwargs) 2025-12-04T12:50:37.6681158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6681389Z method(*args, **kwargs) 2025-12-04T12:50:37.6681607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6681835Z with policy(): 2025-12-04T12:50:37.6682046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6682378Z raise RuntimeError(msg) 2025-12-04T12:50:37.6682867Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312. 2025-12-04T12:50:37.6683300Z 2025-12-04T12:50:37.6683376Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6683789Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6684127Z 2025-12-04T12:50:37.6684217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6684340Z 2025-12-04T12:50:37.6684343Z 2025-12-04T12:50:37.6684427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6684630Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6685024Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-245b018540739122.xml - 2025-12-04T12:50:37.6685388Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6685802Z FAILED [9.7171s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.6686193Z Traceback (most recent call last): 2025-12-04T12:50:37.6686438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6686686Z getattr(self, test_name)() 2025-12-04T12:50:37.6686921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6687153Z fn() 2025-12-04T12:50:37.6687355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6687586Z method(*args, **kwargs) 2025-12-04T12:50:37.6687805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6688035Z method(*args, **kwargs) 2025-12-04T12:50:37.6688254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6688479Z with policy(): 2025-12-04T12:50:37.6688691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6688953Z raise RuntimeError(msg) 2025-12-04T12:50:37.6689422Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312. 2025-12-04T12:50:37.6689890Z 2025-12-04T12:50:37.6689966Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6690384Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6690723Z 2025-12-04T12:50:37.6690810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6690997Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6691198Z ======================= 1 failed, 7 deselected in 9.73s ======================== 2025-12-04T12:50:37.6691352Z Got exit code 1 2025-12-04T12:50:37.6691661Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6692077Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.6692467Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-4af3cf649b2af802.xml 2025-12-04T12:50:37.6692790Z ============================= test session starts ============================== 2025-12-04T12:50:37.6693001Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6693190Z cachedir: .pytest_cache 2025-12-04T12:50:37.6693417Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6693660Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6693779Z configfile: pytest.ini 2025-12-04T12:50:37.6694006Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6694082Z collecting ... collected 8 items / 2 deselected / 6 selected 2025-12-04T12:50:37.6694139Z stepcurrent: skipping 2 already run items. 2025-12-04T12:50:37.6694184Z Running 6 items in this shard 2025-12-04T12:50:37.6694188Z 2025-12-04T12:50:37.6694567Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 12:46:48.318000 399525 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 399594 2025-12-04T12:50:37.6694729Z I1204 12:46:48.319000 399525 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 399595 2025-12-04T12:50:37.6694890Z I1204 12:46:48.320000 399525 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 399596 2025-12-04T12:50:37.6695044Z I1204 12:46:48.320000 399525 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 399597 2025-12-04T12:50:37.6695725Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6695773Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6696476Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6696521Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6697198Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6697268Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6697939Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6697984Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6698489Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6698544Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6699036Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6699086Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6699580Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6699628Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6700162Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6700210Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6700348Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6700504Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6700820Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6700972Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6701252Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6701370Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6701641Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6701803Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6702090Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6702233Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6702505Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6702635Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6702910Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6703052Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6703580Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3707764736. 2025-12-04T12:50:37.6703691Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6703883Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6704314Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6704425Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6704632Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6704793Z E1204 12:46:56.591000 399594 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6704927Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6705107Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6705390Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6705540Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6705819Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6705936Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6706206Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6706375Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6706647Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6706789Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6707056Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6707189Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6707462Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6707604Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6708125Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3575644160. 2025-12-04T12:50:37.6708233Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6708427Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6708851Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6708962Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6709170Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6709328Z E1204 12:46:56.598000 399595 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6709484Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6709638Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6709959Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6710106Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6710385Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6710504Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6710807Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6710950Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6711219Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6711363Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6711636Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6711769Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6712040Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6712182Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6712704Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3307208704. 2025-12-04T12:50:37.6712812Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6713004Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6713427Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6713536Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6713743Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6713931Z E1204 12:46:57.145000 399597 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6714065Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6714218Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6714498Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6714644Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6714924Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6715065Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6715337Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6715480Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6715748Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6715892Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6716165Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6716297Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6716567Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6716709Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6717229Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3307208704. 2025-12-04T12:50:37.6717339Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6717529Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6717950Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6718059Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6718286Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6718449Z E1204 12:46:57.151000 399596 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6718492Z FAILED [9.7150s] [ 16%] 2025-12-04T12:50:37.6718494Z 2025-12-04T12:50:37.6718551Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6718705Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6718754Z Traceback (most recent call last): 2025-12-04T12:50:37.6718920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6718965Z self._join_processes(fn) 2025-12-04T12:50:37.6719142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6719222Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6719405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6719450Z raise RuntimeError(error) 2025-12-04T12:50:37.6719532Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6719579Z Traceback (most recent call last): 2025-12-04T12:50:37.6719780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6719824Z getattr(self, test_name)() 2025-12-04T12:50:37.6719988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6720024Z fn() 2025-12-04T12:50:37.6720180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6720223Z method(*args, **kwargs) 2025-12-04T12:50:37.6720375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6720417Z method(*args, **kwargs) 2025-12-04T12:50:37.6720570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6720608Z with policy(): 2025-12-04T12:50:37.6720764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6720807Z raise RuntimeError(msg) 2025-12-04T12:50:37.6721211Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3575644160. 2025-12-04T12:50:37.6721215Z 2025-12-04T12:50:37.6721294Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6721598Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6721601Z 2025-12-04T12:50:37.6721691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6721693Z 2025-12-04T12:50:37.6721695Z 2025-12-04T12:50:37.6721772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6721864Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6722179Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-4af3cf649b2af802.xml - 2025-12-04T12:50:37.6722243Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6722559Z FAILED [9.7150s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6722606Z Traceback (most recent call last): 2025-12-04T12:50:37.6722774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6722817Z getattr(self, test_name)() 2025-12-04T12:50:37.6722980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6723015Z fn() 2025-12-04T12:50:37.6723172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6723243Z method(*args, **kwargs) 2025-12-04T12:50:37.6723396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6723436Z method(*args, **kwargs) 2025-12-04T12:50:37.6723589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6723627Z with policy(): 2025-12-04T12:50:37.6723781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6723822Z raise RuntimeError(msg) 2025-12-04T12:50:37.6724227Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3575644160. 2025-12-04T12:50:37.6724231Z 2025-12-04T12:50:37.6724309Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6724614Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6724616Z 2025-12-04T12:50:37.6724705Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6724769Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6724834Z ======================= 1 failed, 2 deselected in 9.72s ======================== 2025-12-04T12:50:37.6724873Z Got exit code 1 2025-12-04T12:50:37.6724916Z Retrying single test... 2025-12-04T12:50:37.6725145Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-12528ec4b19d26b7.xml 2025-12-04T12:50:37.6725207Z ============================= test session starts ============================== 2025-12-04T12:50:37.6725321Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6725365Z cachedir: .pytest_cache 2025-12-04T12:50:37.6725522Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6725571Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6725613Z configfile: pytest.ini 2025-12-04T12:50:37.6725779Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6725853Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6726175Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6726223Z Running 1 items in this shard 2025-12-04T12:50:37.6726225Z 2025-12-04T12:50:37.6726599Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 12:47:00.790000 399995 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 400064 2025-12-04T12:50:37.6726757Z I1204 12:47:00.791000 399995 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 400065 2025-12-04T12:50:37.6726910Z I1204 12:47:00.792000 399995 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 400066 2025-12-04T12:50:37.6727063Z I1204 12:47:00.793000 399995 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 400067 2025-12-04T12:50:37.6727755Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6727815Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6728488Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6728533Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6729198Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6729244Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6729961Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6730007Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6730508Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6730560Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6731081Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6731131Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6731620Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6731668Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6732158Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6732241Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6732382Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6732540Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6732826Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6732976Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6733257Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6733378Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6733649Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6733793Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6734063Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6734209Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6734482Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6734612Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6734886Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6735029Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6735577Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3584032768. 2025-12-04T12:50:37.6735692Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6735883Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6736308Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6736418Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6736653Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6736811Z E1204 12:47:08.986000 400065 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6736945Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6737098Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6737380Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6737531Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6737810Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6737927Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6738195Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6738338Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6738608Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6738752Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6739023Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6739151Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6739423Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6739564Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6740157Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3831496704. 2025-12-04T12:50:37.6740267Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6740458Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6740885Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6741024Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6741229Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6741388Z E1204 12:47:08.993000 400064 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6741520Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6741672Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6741956Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6742104Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6742386Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6742503Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6742772Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6742915Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6743186Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6743329Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6743598Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6743730Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6744027Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6744171Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6744691Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3307208704. 2025-12-04T12:50:37.6744799Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6744991Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6745415Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6745548Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6745753Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6745910Z E1204 12:47:09.512000 400066 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6746042Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6746197Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6746485Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6746632Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6746911Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6747028Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6747299Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6747443Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6747714Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6747856Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6748124Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6748255Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6748550Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6748695Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6749215Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1254096896 and is now 3307208704. 2025-12-04T12:50:37.6749323Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6749518Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6750005Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6750114Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6750318Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6750476Z E1204 12:47:09.524000 400067 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6750520Z FAILED [9.5182s] [100%] 2025-12-04T12:50:37.6750524Z 2025-12-04T12:50:37.6750581Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6750736Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6750783Z Traceback (most recent call last): 2025-12-04T12:50:37.6750949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6750994Z self._join_processes(fn) 2025-12-04T12:50:37.6751169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6751224Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6751406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6751451Z raise RuntimeError(error) 2025-12-04T12:50:37.6751536Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6751584Z Traceback (most recent call last): 2025-12-04T12:50:37.6751748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6751792Z getattr(self, test_name)() 2025-12-04T12:50:37.6751954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6751989Z fn() 2025-12-04T12:50:37.6752144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6752186Z method(*args, **kwargs) 2025-12-04T12:50:37.6752340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6752383Z method(*args, **kwargs) 2025-12-04T12:50:37.6752589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6752634Z with policy(): 2025-12-04T12:50:37.6752787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6752831Z raise RuntimeError(msg) 2025-12-04T12:50:37.6753233Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3831496704. 2025-12-04T12:50:37.6753235Z 2025-12-04T12:50:37.6753314Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6753621Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6753666Z 2025-12-04T12:50:37.6753757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6753760Z 2025-12-04T12:50:37.6753819Z Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6753868Z Traceback (most recent call last): 2025-12-04T12:50:37.6754033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6754076Z getattr(self, test_name)() 2025-12-04T12:50:37.6754238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6754273Z fn() 2025-12-04T12:50:37.6754429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6754469Z method(*args, **kwargs) 2025-12-04T12:50:37.6754624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6754665Z method(*args, **kwargs) 2025-12-04T12:50:37.6754816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6754853Z with policy(): 2025-12-04T12:50:37.6755005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6755046Z raise RuntimeError(msg) 2025-12-04T12:50:37.6755446Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3584032768. 2025-12-04T12:50:37.6755448Z 2025-12-04T12:50:37.6755526Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6755831Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6755834Z 2025-12-04T12:50:37.6755921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6755923Z 2025-12-04T12:50:37.6755925Z 2025-12-04T12:50:37.6756001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6756090Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6756363Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-12528ec4b19d26b7.xml - 2025-12-04T12:50:37.6756425Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6756768Z FAILED [9.5182s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6756817Z Traceback (most recent call last): 2025-12-04T12:50:37.6756980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6757025Z getattr(self, test_name)() 2025-12-04T12:50:37.6757184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6757220Z fn() 2025-12-04T12:50:37.6757371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6757412Z method(*args, **kwargs) 2025-12-04T12:50:37.6757591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6757644Z method(*args, **kwargs) 2025-12-04T12:50:37.6757795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6757832Z with policy(): 2025-12-04T12:50:37.6757984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6758025Z raise RuntimeError(msg) 2025-12-04T12:50:37.6758424Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3831496704. 2025-12-04T12:50:37.6758426Z 2025-12-04T12:50:37.6758502Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6758806Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6758810Z 2025-12-04T12:50:37.6758895Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6758899Z 2025-12-04T12:50:37.6758957Z Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6759004Z Traceback (most recent call last): 2025-12-04T12:50:37.6759165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6759208Z getattr(self, test_name)() 2025-12-04T12:50:37.6759367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6759405Z fn() 2025-12-04T12:50:37.6759557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6759599Z method(*args, **kwargs) 2025-12-04T12:50:37.6759793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6759835Z method(*args, **kwargs) 2025-12-04T12:50:37.6759985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6760024Z with policy(): 2025-12-04T12:50:37.6760178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6760221Z raise RuntimeError(msg) 2025-12-04T12:50:37.6760651Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3584032768. 2025-12-04T12:50:37.6760655Z 2025-12-04T12:50:37.6760730Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6761031Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6761035Z 2025-12-04T12:50:37.6761120Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6761185Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6761248Z ======================= 1 failed, 7 deselected in 9.53s ======================== 2025-12-04T12:50:37.6761288Z Got exit code 1 2025-12-04T12:50:37.6761329Z Retrying single test... 2025-12-04T12:50:37.6761574Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-304e326ed9ae6d3a.xml 2025-12-04T12:50:37.6761644Z ============================= test session starts ============================== 2025-12-04T12:50:37.6761758Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6761800Z cachedir: .pytest_cache 2025-12-04T12:50:37.6761960Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6762006Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6762049Z configfile: pytest.ini 2025-12-04T12:50:37.6762212Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6762287Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6762584Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6762631Z Running 1 items in this shard 2025-12-04T12:50:37.6762633Z 2025-12-04T12:50:37.6763009Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 12:47:12.976000 400465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 400534 2025-12-04T12:50:37.6763163Z I1204 12:47:12.976000 400465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 400535 2025-12-04T12:50:37.6763316Z I1204 12:47:12.977000 400465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 400536 2025-12-04T12:50:37.6763469Z I1204 12:47:12.978000 400465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 400537 2025-12-04T12:50:37.6764155Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6764199Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6764888Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6764934Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6765602Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6765645Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6766313Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6766381Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6766881Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6766931Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6767423Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6767473Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6767962Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6768010Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6768496Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6768546Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6768683Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6768840Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6769124Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6769273Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6769586Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6769746Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6770019Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6770160Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6770431Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6770587Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6770873Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6771005Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6771278Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6771421Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6771944Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3649044480. 2025-12-04T12:50:37.6772057Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6772248Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6772673Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6772787Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6772991Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6773152Z E1204 12:47:21.318000 400537 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6773284Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6773439Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6773720Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6773897Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6774177Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6774295Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6774564Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6774705Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6774977Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6775138Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6775411Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6775540Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6775814Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6775958Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6776480Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3470786560. 2025-12-04T12:50:37.6776590Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6776781Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6777208Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6777316Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6777522Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6777681Z E1204 12:47:21.771000 400534 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6777811Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6777966Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6778270Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6778421Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6778697Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6778815Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6779085Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6779226Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6779519Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6779658Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6779978Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6780107Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6780384Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6780528Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6781047Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3307208704. 2025-12-04T12:50:37.6781156Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6781344Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6781772Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6781879Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6782082Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6782240Z E1204 12:47:21.816000 400535 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6782370Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6782551Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6782835Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6782983Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6783260Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6783377Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6783647Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6783814Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6784085Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6784224Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6784493Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6784620Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6784895Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6785038Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6785558Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1256194048 and is now 3307208704. 2025-12-04T12:50:37.6785666Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6785856Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6786280Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6786387Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6786589Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6786746Z E1204 12:47:21.848000 400536 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6786790Z FAILED [9.8149s] [100%] 2025-12-04T12:50:37.6786817Z 2025-12-04T12:50:37.6786875Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6787028Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6787076Z Traceback (most recent call last): 2025-12-04T12:50:37.6787238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6787284Z self._join_processes(fn) 2025-12-04T12:50:37.6787457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6787513Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6787690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6787754Z raise RuntimeError(error) 2025-12-04T12:50:37.6787834Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.6787891Z Traceback (most recent call last): 2025-12-04T12:50:37.6788053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6788096Z getattr(self, test_name)() 2025-12-04T12:50:37.6788254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6788292Z fn() 2025-12-04T12:50:37.6788443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6788486Z method(*args, **kwargs) 2025-12-04T12:50:37.6788637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6788680Z method(*args, **kwargs) 2025-12-04T12:50:37.6788831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6788871Z with policy(): 2025-12-04T12:50:37.6789024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6789067Z raise RuntimeError(msg) 2025-12-04T12:50:37.6789469Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3649044480. 2025-12-04T12:50:37.6789473Z 2025-12-04T12:50:37.6789549Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6789895Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6789899Z 2025-12-04T12:50:37.6789986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6789988Z 2025-12-04T12:50:37.6789991Z 2025-12-04T12:50:37.6790067Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6790155Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6790428Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-304e326ed9ae6d3a.xml - 2025-12-04T12:50:37.6790489Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6790835Z FAILED [9.8149s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.6790886Z Traceback (most recent call last): 2025-12-04T12:50:37.6791050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6791094Z getattr(self, test_name)() 2025-12-04T12:50:37.6791255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6791291Z fn() 2025-12-04T12:50:37.6791442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6791484Z method(*args, **kwargs) 2025-12-04T12:50:37.6791636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6791678Z method(*args, **kwargs) 2025-12-04T12:50:37.6791845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6791899Z with policy(): 2025-12-04T12:50:37.6792051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6792094Z raise RuntimeError(msg) 2025-12-04T12:50:37.6792493Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3649044480. 2025-12-04T12:50:37.6792497Z 2025-12-04T12:50:37.6792572Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6792880Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6792884Z 2025-12-04T12:50:37.6792969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6793034Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6793096Z ======================= 1 failed, 7 deselected in 9.83s ======================== 2025-12-04T12:50:37.6793136Z Got exit code 1 2025-12-04T12:50:37.6793388Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6793519Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.6793748Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7e25bc3b8d1bd830.xml 2025-12-04T12:50:37.6793809Z ============================= test session starts ============================== 2025-12-04T12:50:37.6793924Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6793966Z cachedir: .pytest_cache 2025-12-04T12:50:37.6794335Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6794382Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6794422Z configfile: pytest.ini 2025-12-04T12:50:37.6794586Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6794660Z collecting ... collected 8 items / 3 deselected / 5 selected 2025-12-04T12:50:37.6794714Z stepcurrent: skipping 3 already run items. 2025-12-04T12:50:37.6794759Z Running 5 items in this shard 2025-12-04T12:50:37.6794762Z 2025-12-04T12:50:37.6795160Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 12:47:25.515000 400935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 401004 2025-12-04T12:50:37.6795317Z I1204 12:47:25.516000 400935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 401005 2025-12-04T12:50:37.6795469Z I1204 12:47:25.517000 400935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 401006 2025-12-04T12:50:37.6795622Z I1204 12:47:25.517000 400935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 401007 2025-12-04T12:50:37.6796304Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6796370Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6797044Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6797086Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6797759Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6797804Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6798467Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6798513Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6799008Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6799059Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6799550Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6799599Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6800158Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6800206Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6800696Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6800742Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6800898Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6801076Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6801358Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6801506Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6801786Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6801906Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6802178Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6802320Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6802593Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6802735Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6803007Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6803138Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6803413Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6803554Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6804078Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3594518528. 2025-12-04T12:50:37.6804210Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6804401Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6804825Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6804934Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6805141Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6805312Z E1204 12:47:33.698000 401005 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6805457Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6805609Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6805889Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6806036Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6806314Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6806433Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6806702Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6806844Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6807111Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6807254Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6807527Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6807658Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6807929Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6808070Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6808608Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3682598912. 2025-12-04T12:50:37.6808719Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6808910Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6809333Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6809440Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6809660Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6809872Z E1204 12:47:33.747000 401004 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6810003Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6810155Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6810435Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6810581Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6810862Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6810980Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6811249Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6811392Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6811662Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6811806Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6812078Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6812208Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6812482Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6812623Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6813168Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3284140032. 2025-12-04T12:50:37.6813279Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6813470Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6813890Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6814015Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6814233Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6814391Z E1204 12:47:34.205000 401006 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6814523Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6814675Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6814955Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6815105Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6815384Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6815501Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6815770Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6815912Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6816184Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6816327Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6816597Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6816727Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6816997Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6817166Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6817683Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3284140032. 2025-12-04T12:50:37.6817791Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6817981Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6818403Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6818539Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6818743Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6818902Z E1204 12:47:34.242000 401007 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6818943Z FAILED [9.3166s] [ 20%] 2025-12-04T12:50:37.6818945Z 2025-12-04T12:50:37.6819001Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6819152Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.6819202Z Traceback (most recent call last): 2025-12-04T12:50:37.6819366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6819411Z self._join_processes(fn) 2025-12-04T12:50:37.6819586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6819641Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6819865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6819910Z raise RuntimeError(error) 2025-12-04T12:50:37.6819992Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6820039Z Traceback (most recent call last): 2025-12-04T12:50:37.6820201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6820247Z getattr(self, test_name)() 2025-12-04T12:50:37.6820408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6820443Z fn() 2025-12-04T12:50:37.6820595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6820636Z method(*args, **kwargs) 2025-12-04T12:50:37.6820790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6820832Z method(*args, **kwargs) 2025-12-04T12:50:37.6820983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6821023Z with policy(): 2025-12-04T12:50:37.6821204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6821249Z raise RuntimeError(msg) 2025-12-04T12:50:37.6821649Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3594518528. 2025-12-04T12:50:37.6821652Z 2025-12-04T12:50:37.6821732Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6822033Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6822035Z 2025-12-04T12:50:37.6822126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6822128Z 2025-12-04T12:50:37.6822144Z 2025-12-04T12:50:37.6822222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6822326Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6822601Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7e25bc3b8d1bd830.xml - 2025-12-04T12:50:37.6822663Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6822977Z FAILED [9.3166s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6823024Z Traceback (most recent call last): 2025-12-04T12:50:37.6823191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6823236Z getattr(self, test_name)() 2025-12-04T12:50:37.6823400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6823434Z fn() 2025-12-04T12:50:37.6823588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6823628Z method(*args, **kwargs) 2025-12-04T12:50:37.6823780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6823820Z method(*args, **kwargs) 2025-12-04T12:50:37.6823972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6824009Z with policy(): 2025-12-04T12:50:37.6824163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6824208Z raise RuntimeError(msg) 2025-12-04T12:50:37.6824608Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3594518528. 2025-12-04T12:50:37.6824610Z 2025-12-04T12:50:37.6824687Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6824988Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6824990Z 2025-12-04T12:50:37.6825078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6825141Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6825227Z ======================= 1 failed, 3 deselected in 9.33s ======================== 2025-12-04T12:50:37.6825266Z Got exit code 1 2025-12-04T12:50:37.6825308Z Retrying single test... 2025-12-04T12:50:37.6825533Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-962687944c0b89e2.xml 2025-12-04T12:50:37.6825594Z ============================= test session starts ============================== 2025-12-04T12:50:37.6825706Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6825750Z cachedir: .pytest_cache 2025-12-04T12:50:37.6825906Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6825954Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6825996Z configfile: pytest.ini 2025-12-04T12:50:37.6826177Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6826261Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6826556Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6826602Z Running 1 items in this shard 2025-12-04T12:50:37.6826604Z 2025-12-04T12:50:37.6826977Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 12:47:37.617000 401405 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 401474 2025-12-04T12:50:37.6827134Z I1204 12:47:37.618000 401405 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 401475 2025-12-04T12:50:37.6827288Z I1204 12:47:37.618000 401405 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 401476 2025-12-04T12:50:37.6827441Z I1204 12:47:37.619000 401405 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 401477 2025-12-04T12:50:37.6828123Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6828167Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6828837Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6828882Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6829550Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6829617Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6830318Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6830363Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6830864Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6830931Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6831438Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6831486Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6831976Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6832025Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6832514Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6832562Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6832697Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6832853Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6833137Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6833288Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6833568Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6833685Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6833957Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6834100Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6834405Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6834547Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6834816Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6834945Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6835216Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6835371Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6835905Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3607101440. 2025-12-04T12:50:37.6836017Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6836206Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6836630Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6836741Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6836945Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6837103Z E1204 12:47:45.897000 401474 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6837235Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6837389Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6837672Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6837822Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6838100Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6838216Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6838486Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6838650Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6838920Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6839061Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6839331Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6839460Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6839779Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6839947Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6840465Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3384803328. 2025-12-04T12:50:37.6840573Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6840764Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6841190Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6841297Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6841501Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6841658Z E1204 12:47:45.899000 401477 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6841789Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6841945Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6842228Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6842376Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6842653Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6842769Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6843065Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6843208Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6843476Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6843617Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6843886Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6844028Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6844313Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6844455Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6844972Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3449815040. 2025-12-04T12:50:37.6845080Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6845272Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6845695Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6845802Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6846005Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6846164Z E1204 12:47:45.923000 401476 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6846298Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6846449Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6846732Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6846878Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6847157Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6847294Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6847565Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6847707Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6847975Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6848116Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6848386Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6848537Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6848810Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6848952Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6849472Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3284140032. 2025-12-04T12:50:37.6849581Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6849848Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6850268Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6850377Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6850582Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6850742Z E1204 12:47:46.397000 401475 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6850783Z FAILED [9.5141s] [100%] 2025-12-04T12:50:37.6850785Z 2025-12-04T12:50:37.6850841Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6850991Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.6851039Z Traceback (most recent call last): 2025-12-04T12:50:37.6851205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6851249Z self._join_processes(fn) 2025-12-04T12:50:37.6851424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6851514Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6851697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6851741Z raise RuntimeError(error) 2025-12-04T12:50:37.6851824Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6851870Z Traceback (most recent call last): 2025-12-04T12:50:37.6852032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6852074Z getattr(self, test_name)() 2025-12-04T12:50:37.6852233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6852269Z fn() 2025-12-04T12:50:37.6852421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6852481Z method(*args, **kwargs) 2025-12-04T12:50:37.6852644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6852685Z method(*args, **kwargs) 2025-12-04T12:50:37.6852835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6852873Z with policy(): 2025-12-04T12:50:37.6853025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6853068Z raise RuntimeError(msg) 2025-12-04T12:50:37.6853467Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3607101440. 2025-12-04T12:50:37.6853470Z 2025-12-04T12:50:37.6853549Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6853856Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6853858Z 2025-12-04T12:50:37.6853946Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6853948Z 2025-12-04T12:50:37.6853950Z 2025-12-04T12:50:37.6854026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6854113Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6854384Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-962687944c0b89e2.xml - 2025-12-04T12:50:37.6854447Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6854764Z FAILED [9.5141s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.6854811Z Traceback (most recent call last): 2025-12-04T12:50:37.6854976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6855019Z getattr(self, test_name)() 2025-12-04T12:50:37.6855181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6855215Z fn() 2025-12-04T12:50:37.6855368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6855429Z method(*args, **kwargs) 2025-12-04T12:50:37.6855582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6855622Z method(*args, **kwargs) 2025-12-04T12:50:37.6855773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6855811Z with policy(): 2025-12-04T12:50:37.6855963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6856005Z raise RuntimeError(msg) 2025-12-04T12:50:37.6856404Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3607101440. 2025-12-04T12:50:37.6856418Z 2025-12-04T12:50:37.6856497Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6856811Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6856813Z 2025-12-04T12:50:37.6856901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6856964Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6857027Z ======================= 1 failed, 7 deselected in 9.52s ======================== 2025-12-04T12:50:37.6857064Z Got exit code 1 2025-12-04T12:50:37.6857106Z Retrying single test... 2025-12-04T12:50:37.6857331Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1de42d06e632f778.xml 2025-12-04T12:50:37.6857393Z ============================= test session starts ============================== 2025-12-04T12:50:37.6857506Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6857546Z cachedir: .pytest_cache 2025-12-04T12:50:37.6857705Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6857751Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6857793Z configfile: pytest.ini 2025-12-04T12:50:37.6857955Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6858029Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6858327Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6858373Z Running 1 items in this shard 2025-12-04T12:50:37.6858377Z 2025-12-04T12:50:37.6858748Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 12:47:49.827000 401875 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 401944 2025-12-04T12:50:37.6858904Z I1204 12:47:49.828000 401875 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 401945 2025-12-04T12:50:37.6859055Z I1204 12:47:49.828000 401875 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 401946 2025-12-04T12:50:37.6859206Z I1204 12:47:49.829000 401875 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 401947 2025-12-04T12:50:37.6859955Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6860003Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6860673Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6860730Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6861397Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6861459Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6862129Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6862174Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6862671Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6862720Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6863212Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6863263Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6863751Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6863797Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6864308Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6864357Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6864493Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6864650Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6864933Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6865079Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6865360Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6865508Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6865782Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6865922Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6866191Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6866331Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6866603Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6866732Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6867003Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6867144Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6867665Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3449815040. 2025-12-04T12:50:37.6867778Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6867969Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6868394Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6868503Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6868732Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6868892Z E1204 12:47:58.085000 401944 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6869022Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6869175Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6869453Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6869601Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6869936Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6870068Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6870339Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6870481Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6870750Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6870892Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6871161Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6871290Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6871561Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6871701Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6872222Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1266679808 and is now 3284140032. 2025-12-04T12:50:37.6872332Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6872521Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6872947Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6873083Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6873288Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6873445Z E1204 12:47:58.089000 401947 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6873576Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6873729Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6874010Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6874173Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6874463Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6874579Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6874849Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6874993Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6875263Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6875405Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6875674Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6875802Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6876074Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6876215Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6876738Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3284140032. 2025-12-04T12:50:37.6876848Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6877041Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6880427Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6880560Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6880770Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6880932Z E1204 12:47:58.107000 401946 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6881064Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6881220Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6881504Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6881687Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6881965Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6882084Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6882353Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6882502Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6882777Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6882917Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6883187Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6883316Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6883590Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6883733Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6884254Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3284140032. 2025-12-04T12:50:37.6884365Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6884553Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6885008Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6885118Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6885322Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6885479Z E1204 12:47:58.112000 401945 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6885523Z FAILED [9.5146s] [100%] 2025-12-04T12:50:37.6885526Z 2025-12-04T12:50:37.6885586Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6885757Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.6885820Z Traceback (most recent call last): 2025-12-04T12:50:37.6885985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6886031Z self._join_processes(fn) 2025-12-04T12:50:37.6886205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6886262Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6886440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6886487Z raise RuntimeError(error) 2025-12-04T12:50:37.6886569Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.6886616Z Traceback (most recent call last): 2025-12-04T12:50:37.6886780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6886825Z getattr(self, test_name)() 2025-12-04T12:50:37.6886984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6887022Z fn() 2025-12-04T12:50:37.6887174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6887217Z method(*args, **kwargs) 2025-12-04T12:50:37.6887367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6887410Z method(*args, **kwargs) 2025-12-04T12:50:37.6887560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6887602Z with policy(): 2025-12-04T12:50:37.6887754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6887798Z raise RuntimeError(msg) 2025-12-04T12:50:37.6888200Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1266679808 and is now 3284140032. 2025-12-04T12:50:37.6888203Z 2025-12-04T12:50:37.6888280Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6888584Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6888587Z 2025-12-04T12:50:37.6888699Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6888702Z 2025-12-04T12:50:37.6888704Z 2025-12-04T12:50:37.6888783Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6888873Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6889150Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1de42d06e632f778.xml - 2025-12-04T12:50:37.6889212Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6889529Z FAILED [9.5146s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.6889579Z Traceback (most recent call last): 2025-12-04T12:50:37.6889802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6889864Z getattr(self, test_name)() 2025-12-04T12:50:37.6890025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6890061Z fn() 2025-12-04T12:50:37.6890212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6890254Z method(*args, **kwargs) 2025-12-04T12:50:37.6890406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6890447Z method(*args, **kwargs) 2025-12-04T12:50:37.6890596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6890637Z with policy(): 2025-12-04T12:50:37.6890789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6890832Z raise RuntimeError(msg) 2025-12-04T12:50:37.6891232Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1266679808 and is now 3284140032. 2025-12-04T12:50:37.6891236Z 2025-12-04T12:50:37.6891312Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6891616Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6891619Z 2025-12-04T12:50:37.6891709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6891776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6891839Z ======================= 1 failed, 7 deselected in 9.52s ======================== 2025-12-04T12:50:37.6891879Z Got exit code 1 2025-12-04T12:50:37.6892130Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T12:50:37.6892261Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.6892488Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e08f770814ebba57.xml 2025-12-04T12:50:37.6892548Z ============================= test session starts ============================== 2025-12-04T12:50:37.6892697Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6892742Z cachedir: .pytest_cache 2025-12-04T12:50:37.6892901Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6892950Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6892993Z configfile: pytest.ini 2025-12-04T12:50:37.6893157Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6893232Z collecting ... collected 8 items / 4 deselected / 4 selected 2025-12-04T12:50:37.6893285Z stepcurrent: skipping 4 already run items. 2025-12-04T12:50:37.6893331Z Running 4 items in this shard 2025-12-04T12:50:37.6893333Z 2025-12-04T12:50:37.6893721Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 12:48:01.710000 402345 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 402414 2025-12-04T12:50:37.6893911Z I1204 12:48:01.711000 402345 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 402415 2025-12-04T12:50:37.6894064Z I1204 12:48:01.711000 402345 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 402416 2025-12-04T12:50:37.6894218Z I1204 12:48:01.712000 402345 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 402417 2025-12-04T12:50:37.6894906Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6894954Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6895623Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6895667Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6896334Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6896379Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6897046Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6897089Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6897608Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6897663Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6898155Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6898203Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6898691Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6898766Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6899255Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6899301Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6900006Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6900052Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6900716Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6900760Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6901250Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6901315Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6901803Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6901862Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6902138Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6902185Z local_shape = tensor.shape 2025-12-04T12:50:37.6902421Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6902465Z local_shape = tensor.shape 2025-12-04T12:50:37.6902699Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6902737Z tensor.shape, 2025-12-04T12:50:37.6902970Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6903010Z tensor.dtype, 2025-12-04T12:50:37.6903258Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6903313Z tensor.shape, 2025-12-04T12:50:37.6903542Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6903581Z tensor.dtype, 2025-12-04T12:50:37.6904252Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6904298Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6904970Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6905013Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6905503Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6905563Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6906047Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6906104Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6906339Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6906384Z local_shape = tensor.shape 2025-12-04T12:50:37.6906637Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6906678Z tensor.shape, 2025-12-04T12:50:37.6906914Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6906952Z tensor.dtype, 2025-12-04T12:50:37.6907183Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6907227Z local_shape = tensor.shape 2025-12-04T12:50:37.6907458Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6907497Z tensor.shape, 2025-12-04T12:50:37.6907728Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6907791Z tensor.dtype, 2025-12-04T12:50:37.6907928Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6908087Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6908370Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6908520Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6908803Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6908921Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6909196Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6909338Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6909608Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6909785Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6910059Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6910192Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6910463Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6910606Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6911173Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1262485504 and is now 3315597312. 2025-12-04T12:50:37.6911288Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6911479Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6911921Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6912031Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6912236Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6912427Z E1204 12:48:11.941000 402417 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6912558Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6912712Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6912990Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6913138Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6913419Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6913534Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6913806Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6913948Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6914219Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6914362Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6914633Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6914761Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6915033Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6915175Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6915726Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312. 2025-12-04T12:50:37.6915838Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6916026Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6916461Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6916582Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6916797Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6916957Z E1204 12:48:11.952000 402415 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6917087Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6917240Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6917517Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6917667Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6917944Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6918060Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6918329Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6918469Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6918743Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6918884Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6919152Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6919281Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6919552Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6919786Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6920317Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3315597312. 2025-12-04T12:50:37.6920425Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6920613Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6921050Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6921184Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6921387Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6921545Z E1204 12:48:11.981000 402416 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6921674Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6921827Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6922107Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6922255Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6922531Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6922646Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6922914Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6923058Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6923462Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6923603Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6923872Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6924000Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6924298Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6924441Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6924969Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3705667584. 2025-12-04T12:50:37.6925078Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6925266Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6925713Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6925832Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6926035Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6926194Z E1204 12:48:12.029000 402414 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6926237Z FAILED [11.5160s] [ 25%] 2025-12-04T12:50:37.6926241Z 2025-12-04T12:50:37.6926298Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6926462Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6926511Z Traceback (most recent call last): 2025-12-04T12:50:37.6926675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6926719Z self._join_processes(fn) 2025-12-04T12:50:37.6926894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6926949Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6927128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6927173Z raise RuntimeError(error) 2025-12-04T12:50:37.6927254Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6927303Z Traceback (most recent call last): 2025-12-04T12:50:37.6927466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6927509Z getattr(self, test_name)() 2025-12-04T12:50:37.6927669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6927706Z fn() 2025-12-04T12:50:37.6927859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6927900Z method(*args, **kwargs) 2025-12-04T12:50:37.6928054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6928094Z method(*args, **kwargs) 2025-12-04T12:50:37.6928246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6928307Z with policy(): 2025-12-04T12:50:37.6928463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6928504Z raise RuntimeError(msg) 2025-12-04T12:50:37.6928918Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312. 2025-12-04T12:50:37.6928921Z 2025-12-04T12:50:37.6928999Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6929311Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6929327Z 2025-12-04T12:50:37.6929418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6929433Z 2025-12-04T12:50:37.6929435Z 2025-12-04T12:50:37.6929511Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6929600Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6929922Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e08f770814ebba57.xml - 2025-12-04T12:50:37.6929984Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6930314Z FAILED [11.5160s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.6930362Z Traceback (most recent call last): 2025-12-04T12:50:37.6930530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6930573Z getattr(self, test_name)() 2025-12-04T12:50:37.6930734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6930769Z fn() 2025-12-04T12:50:37.6930920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6930961Z method(*args, **kwargs) 2025-12-04T12:50:37.6931112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6931151Z method(*args, **kwargs) 2025-12-04T12:50:37.6931304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6931342Z with policy(): 2025-12-04T12:50:37.6931495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6931536Z raise RuntimeError(msg) 2025-12-04T12:50:37.6931949Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312. 2025-12-04T12:50:37.6931951Z 2025-12-04T12:50:37.6932027Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6932339Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6932373Z 2025-12-04T12:50:37.6932462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6932526Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6932590Z ======================= 1 failed, 4 deselected in 11.53s ======================= 2025-12-04T12:50:37.6932627Z Got exit code 1 2025-12-04T12:50:37.6932669Z Retrying single test... 2025-12-04T12:50:37.6932896Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-8fc5f1fc00aa8324.xml 2025-12-04T12:50:37.6932956Z ============================= test session starts ============================== 2025-12-04T12:50:37.6933070Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6933112Z cachedir: .pytest_cache 2025-12-04T12:50:37.6933272Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6933357Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6933398Z configfile: pytest.ini 2025-12-04T12:50:37.6933561Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6933633Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6933938Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6933983Z Running 1 items in this shard 2025-12-04T12:50:37.6933985Z 2025-12-04T12:50:37.6934369Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 12:48:15.821000 402883 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 402952 2025-12-04T12:50:37.6934526Z I1204 12:48:15.822000 402883 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 402953 2025-12-04T12:50:37.6934678Z I1204 12:48:15.822000 402883 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 402954 2025-12-04T12:50:37.6934830Z I1204 12:48:15.823000 402883 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 402955 2025-12-04T12:50:37.6935513Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6935561Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6936229Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6936272Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6936961Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6937006Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6937674Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6937717Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6938218Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6938294Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6938782Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6938831Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6939318Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6939366Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6939893Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6939940Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6940618Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6940662Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6941327Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6941370Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6941883Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6941946Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6942431Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6942490Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6942729Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6942803Z local_shape = tensor.shape 2025-12-04T12:50:37.6943038Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6943081Z local_shape = tensor.shape 2025-12-04T12:50:37.6943314Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6943351Z tensor.shape, 2025-12-04T12:50:37.6943584Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6943620Z tensor.dtype, 2025-12-04T12:50:37.6943855Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6943892Z tensor.shape, 2025-12-04T12:50:37.6944125Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6944161Z tensor.dtype, 2025-12-04T12:50:37.6944837Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6944882Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6945552Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6945596Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6946081Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6946179Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6946663Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6946719Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6946954Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6946996Z local_shape = tensor.shape 2025-12-04T12:50:37.6947228Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6947277Z tensor.shape, 2025-12-04T12:50:37.6947523Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6947560Z tensor.dtype, 2025-12-04T12:50:37.6947792Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6947834Z local_shape = tensor.shape 2025-12-04T12:50:37.6948065Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6948101Z tensor.shape, 2025-12-04T12:50:37.6948335Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6948372Z tensor.dtype, 2025-12-04T12:50:37.6948508Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6948664Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6948946Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6949094Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6949372Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6949493Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6949807Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6949952Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6950222Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6950362Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6950659Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6950790Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6951061Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6951201Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6951736Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1262485504 and is now 3399483392. 2025-12-04T12:50:37.6951890Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6952079Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6952515Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6952624Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6952830Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6952991Z E1204 12:48:25.618000 402955 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6953122Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6953274Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6953552Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6953699Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6953979Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6954094Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6954362Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6954503Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6954772Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6954935Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6955204Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6955333Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6955603Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6955743Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6956271Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3456106496. 2025-12-04T12:50:37.6956403Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6956592Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6957025Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6957134Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6957338Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6957496Z E1204 12:48:25.634000 402953 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.6957626Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6957777Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6958057Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6958206Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6958483Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6958598Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6958866Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6959007Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6959298Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6959440Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6959745Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6959873Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6960144Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6960285Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6960844Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3399483392. 2025-12-04T12:50:37.6960952Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6961141Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6961579Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6961687Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6961889Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6962046Z E1204 12:48:25.642000 402954 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6962176Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6962326Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6962607Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6962754Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6963032Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6963146Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6963414Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6963582Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6963853Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6963995Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6964261Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6964392Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6964664Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6964828Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6965355Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3705667584. 2025-12-04T12:50:37.6965462Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6965651Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6966084Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6966194Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6966396Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6966553Z E1204 12:48:26.157000 402952 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.6966595Z FAILED [11.2179s] [100%] 2025-12-04T12:50:37.6966597Z 2025-12-04T12:50:37.6966652Z =================================== FAILURES =================================== 2025-12-04T12:50:37.6966815Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.6966863Z Traceback (most recent call last): 2025-12-04T12:50:37.6967026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.6967069Z self._join_processes(fn) 2025-12-04T12:50:37.6967243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.6967297Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.6967476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.6967520Z raise RuntimeError(error) 2025-12-04T12:50:37.6967601Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.6967678Z Traceback (most recent call last): 2025-12-04T12:50:37.6967840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6967885Z getattr(self, test_name)() 2025-12-04T12:50:37.6968045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6968081Z fn() 2025-12-04T12:50:37.6968232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6968274Z method(*args, **kwargs) 2025-12-04T12:50:37.6968425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6968465Z method(*args, **kwargs) 2025-12-04T12:50:37.6968615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6968681Z with policy(): 2025-12-04T12:50:37.6968847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6968889Z raise RuntimeError(msg) 2025-12-04T12:50:37.6969301Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1262485504 and is now 3399483392. 2025-12-04T12:50:37.6969303Z 2025-12-04T12:50:37.6969380Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6969734Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6969738Z 2025-12-04T12:50:37.6969828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6969831Z 2025-12-04T12:50:37.6969833Z 2025-12-04T12:50:37.6969910Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.6969998Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.6970271Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-8fc5f1fc00aa8324.xml - 2025-12-04T12:50:37.6970332Z =========================== short test summary info ============================ 2025-12-04T12:50:37.6970660Z FAILED [11.2179s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.6970709Z Traceback (most recent call last): 2025-12-04T12:50:37.6970876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6970920Z getattr(self, test_name)() 2025-12-04T12:50:37.6971081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6971115Z fn() 2025-12-04T12:50:37.6971268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6971308Z method(*args, **kwargs) 2025-12-04T12:50:37.6971459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6971499Z method(*args, **kwargs) 2025-12-04T12:50:37.6971649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6971716Z with policy(): 2025-12-04T12:50:37.6971869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6971911Z raise RuntimeError(msg) 2025-12-04T12:50:37.6972321Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1262485504 and is now 3399483392. 2025-12-04T12:50:37.6972323Z 2025-12-04T12:50:37.6972400Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6972711Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6972728Z 2025-12-04T12:50:37.6972816Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6972893Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.6972956Z ======================= 1 failed, 7 deselected in 11.23s ======================= 2025-12-04T12:50:37.6972993Z Got exit code 1 2025-12-04T12:50:37.6973034Z Retrying single test... 2025-12-04T12:50:37.6973262Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-f738f328794c222e.xml 2025-12-04T12:50:37.6973321Z ============================= test session starts ============================== 2025-12-04T12:50:37.6973435Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.6973475Z cachedir: .pytest_cache 2025-12-04T12:50:37.6973635Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.6973683Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.6973724Z configfile: pytest.ini 2025-12-04T12:50:37.6973886Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.6973959Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.6974265Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6974309Z Running 1 items in this shard 2025-12-04T12:50:37.6974311Z 2025-12-04T12:50:37.6974694Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 12:48:29.685000 403421 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 403490 2025-12-04T12:50:37.6974852Z I1204 12:48:29.686000 403421 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 403491 2025-12-04T12:50:37.6975003Z I1204 12:48:29.686000 403421 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 403492 2025-12-04T12:50:37.6975153Z I1204 12:48:29.687000 403421 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 403493 2025-12-04T12:50:37.6975855Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6975900Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6976568Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6976611Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6977276Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6977341Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6978008Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6978051Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6978548Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6978599Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6979087Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6979135Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6979621Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6979669Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6980197Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6980243Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.6980948Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6980993Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6981657Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6981699Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6982185Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6982274Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6982757Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6982815Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6983056Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6983103Z local_shape = tensor.shape 2025-12-04T12:50:37.6983338Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6983376Z tensor.shape, 2025-12-04T12:50:37.6983607Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6983644Z tensor.dtype, 2025-12-04T12:50:37.6983875Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6983918Z local_shape = tensor.shape 2025-12-04T12:50:37.6984149Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6984187Z tensor.shape, 2025-12-04T12:50:37.6984416Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6984453Z tensor.dtype, 2025-12-04T12:50:37.6985122Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6985166Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6985862Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.6985906Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.6986392Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6986464Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6986711Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6986754Z local_shape = tensor.shape 2025-12-04T12:50:37.6986984Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6987021Z tensor.shape, 2025-12-04T12:50:37.6987254Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6987291Z tensor.dtype, 2025-12-04T12:50:37.6987776Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.6987835Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.6988069Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6988111Z local_shape = tensor.shape 2025-12-04T12:50:37.6988343Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6988381Z tensor.shape, 2025-12-04T12:50:37.6988613Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.6988651Z tensor.dtype, 2025-12-04T12:50:37.6988789Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6988944Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6989227Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6989374Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6989652Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6989833Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6990105Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6990245Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6990517Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6990657Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6990926Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6991088Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6991358Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6991498Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6992034Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1245708288 and is now 3500146688. 2025-12-04T12:50:37.6992146Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6992336Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6992769Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6992878Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6993081Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6993242Z E1204 12:48:39.454000 403493 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.6993373Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6993526Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6993805Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6993952Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6994251Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6994367Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6994635Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6994775Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6995047Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6995198Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6995477Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.6995606Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.6995874Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.6996015Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.6996543Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3565158400. 2025-12-04T12:50:37.6996653Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6996844Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.6997276Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.6997386Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.6997589Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.6997748Z E1204 12:48:39.465000 403492 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.6997876Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.6998029Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.6998308Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.6998475Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.6998754Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.6998869Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.6999137Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6999275Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.6999549Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.6999756Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7000025Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7000154Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7000425Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7000567Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7001097Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3707764736. 2025-12-04T12:50:37.7001205Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7001392Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7001825Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.7001935Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7002139Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7002298Z E1204 12:48:39.965000 403490 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7002426Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7002579Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7002883Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7003033Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7003309Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7003423Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7003692Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7003834Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7004130Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7004269Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7004539Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7004668Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7004938Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7005081Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7005606Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312. 2025-12-04T12:50:37.7005715Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7005902Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7006336Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.7006445Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7006649Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7006808Z E1204 12:48:40.003000 403491 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7006849Z FAILED [11.2167s] [100%] 2025-12-04T12:50:37.7006851Z 2025-12-04T12:50:37.7006908Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7007089Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _ 2025-12-04T12:50:37.7007140Z Traceback (most recent call last): 2025-12-04T12:50:37.7007302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7007347Z self._join_processes(fn) 2025-12-04T12:50:37.7007519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7007575Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7007753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7007798Z raise RuntimeError(error) 2025-12-04T12:50:37.7007877Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7007934Z Traceback (most recent call last): 2025-12-04T12:50:37.7008097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7008152Z getattr(self, test_name)() 2025-12-04T12:50:37.7008311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7008346Z fn() 2025-12-04T12:50:37.7008497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7008539Z method(*args, **kwargs) 2025-12-04T12:50:37.7008690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7008730Z method(*args, **kwargs) 2025-12-04T12:50:37.7008882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7008922Z with policy(): 2025-12-04T12:50:37.7009074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7009117Z raise RuntimeError(msg) 2025-12-04T12:50:37.7009531Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1245708288 and is now 3500146688. 2025-12-04T12:50:37.7009534Z 2025-12-04T12:50:37.7009610Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7009976Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.7009979Z 2025-12-04T12:50:37.7010069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7010072Z 2025-12-04T12:50:37.7010074Z 2025-12-04T12:50:37.7010149Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7010239Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7010510Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-f738f328794c222e.xml - 2025-12-04T12:50:37.7010571Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7010896Z FAILED [11.2167s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7010945Z Traceback (most recent call last): 2025-12-04T12:50:37.7011138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7011185Z getattr(self, test_name)() 2025-12-04T12:50:37.7011345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7011381Z fn() 2025-12-04T12:50:37.7011531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7011573Z method(*args, **kwargs) 2025-12-04T12:50:37.7011724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7011763Z method(*args, **kwargs) 2025-12-04T12:50:37.7011913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7011965Z with policy(): 2025-12-04T12:50:37.7012118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7012177Z raise RuntimeError(msg) 2025-12-04T12:50:37.7012588Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1245708288 and is now 3500146688. 2025-12-04T12:50:37.7012590Z 2025-12-04T12:50:37.7012664Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7012977Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.7012980Z 2025-12-04T12:50:37.7013069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7013134Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7013196Z ======================= 1 failed, 7 deselected in 11.23s ======================= 2025-12-04T12:50:37.7013235Z Got exit code 1 2025-12-04T12:50:37.7013497Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T12:50:37.7013627Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.7013853Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-4a518314a42e58d2.xml 2025-12-04T12:50:37.7013910Z ============================= test session starts ============================== 2025-12-04T12:50:37.7014026Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7014068Z cachedir: .pytest_cache 2025-12-04T12:50:37.7014225Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7014271Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7014312Z configfile: pytest.ini 2025-12-04T12:50:37.7014473Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7014547Z collecting ... collected 8 items / 5 deselected / 3 selected 2025-12-04T12:50:37.7014599Z stepcurrent: skipping 5 already run items. 2025-12-04T12:50:37.7014644Z Running 3 items in this shard 2025-12-04T12:50:37.7014646Z 2025-12-04T12:50:37.7015049Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 12:48:43.573000 403959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 404028 2025-12-04T12:50:37.7015208Z I1204 12:48:43.574000 403959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 404029 2025-12-04T12:50:37.7015361Z I1204 12:48:43.574000 403959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 404030 2025-12-04T12:50:37.7015512Z I1204 12:48:43.575000 403959 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 404031 2025-12-04T12:50:37.7016199Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7016271Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7016938Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7016981Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7017647Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7017692Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7018355Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7018398Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7018896Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7018947Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7019435Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7019482Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7020040Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7020089Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7020575Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7020621Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7021299Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7021374Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7022041Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7022083Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7022574Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7022635Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7023119Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7023178Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7023854Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7023898Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7024582Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7024627Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7025115Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7025172Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7025658Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7025724Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7025977Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7026021Z local_shape = tensor.shape 2025-12-04T12:50:37.7026256Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7026292Z tensor.shape, 2025-12-04T12:50:37.7026525Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7026568Z local_shape = tensor.shape 2025-12-04T12:50:37.7026800Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7026838Z tensor.dtype, 2025-12-04T12:50:37.7027068Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7027106Z tensor.shape, 2025-12-04T12:50:37.7027336Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7027373Z tensor.dtype, 2025-12-04T12:50:37.7027602Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7027644Z local_shape = tensor.shape 2025-12-04T12:50:37.7027875Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7027914Z tensor.shape, 2025-12-04T12:50:37.7028143Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7028181Z tensor.dtype, 2025-12-04T12:50:37.7028412Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7028455Z local_shape = tensor.shape 2025-12-04T12:50:37.7028686Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7028724Z tensor.shape, 2025-12-04T12:50:37.7028976Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7029015Z tensor.dtype, 2025-12-04T12:50:37.7029152Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7029307Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7029591Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7029774Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7030056Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7030208Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7030480Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7030623Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7030894Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7031034Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7031303Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7031434Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7031702Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7031843Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7032376Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1113587712 and is now 3477078016. 2025-12-04T12:50:37.7032485Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7032675Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7033110Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7033219Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7033459Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7033621Z E1204 12:48:52.786000 404031 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7033752Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7033904Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7034183Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7034330Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7034619Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7034747Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7035016Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7035156Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7035428Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7035570Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7035838Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7035968Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7036237Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7036378Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7036906Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3542089728. 2025-12-04T12:50:37.7037015Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7037204Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7037654Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7037765Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7037969Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7038128Z E1204 12:48:52.844000 404030 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7038257Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7038410Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7038691Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7038859Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7039136Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7039250Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7039518Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7039657Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7039962Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7040105Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7040372Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7040501Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7040769Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7040913Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7041439Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3292528640. 2025-12-04T12:50:37.7041546Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7041735Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7042191Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7042301Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7042504Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7042662Z E1204 12:48:53.298000 404029 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7042790Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7042942Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7043234Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7043396Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7043673Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7043787Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7044058Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7044199Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7044470Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7044609Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7044880Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7045009Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7045279Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7045421Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7045946Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3676307456. 2025-12-04T12:50:37.7046052Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7046260Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7046692Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7046799Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7047002Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7047159Z E1204 12:48:53.365000 404028 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7047200Z FAILED [10.5201s] [ 33%] 2025-12-04T12:50:37.7047202Z 2025-12-04T12:50:37.7047270Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7047443Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.7047491Z Traceback (most recent call last): 2025-12-04T12:50:37.7047654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7047698Z self._join_processes(fn) 2025-12-04T12:50:37.7047871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7047926Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7048104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7048149Z raise RuntimeError(error) 2025-12-04T12:50:37.7048233Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7048278Z Traceback (most recent call last): 2025-12-04T12:50:37.7048440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7048483Z getattr(self, test_name)() 2025-12-04T12:50:37.7048643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7048677Z fn() 2025-12-04T12:50:37.7048829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7048870Z method(*args, **kwargs) 2025-12-04T12:50:37.7049021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7049061Z method(*args, **kwargs) 2025-12-04T12:50:37.7049214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7049253Z with policy(): 2025-12-04T12:50:37.7049406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7049447Z raise RuntimeError(msg) 2025-12-04T12:50:37.7049904Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1113587712 and is now 3477078016. 2025-12-04T12:50:37.7049907Z 2025-12-04T12:50:37.7049983Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7050328Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7050332Z 2025-12-04T12:50:37.7050423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7050425Z 2025-12-04T12:50:37.7050426Z 2025-12-04T12:50:37.7050502Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7050590Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7050860Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-4a518314a42e58d2.xml - 2025-12-04T12:50:37.7050923Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7051247Z FAILED [10.5201s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7051309Z Traceback (most recent call last): 2025-12-04T12:50:37.7051490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7051534Z getattr(self, test_name)() 2025-12-04T12:50:37.7051695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7051731Z fn() 2025-12-04T12:50:37.7051883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7051923Z method(*args, **kwargs) 2025-12-04T12:50:37.7052075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7052114Z method(*args, **kwargs) 2025-12-04T12:50:37.7052265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7052303Z with policy(): 2025-12-04T12:50:37.7052456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7052497Z raise RuntimeError(msg) 2025-12-04T12:50:37.7052909Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1113587712 and is now 3477078016. 2025-12-04T12:50:37.7052911Z 2025-12-04T12:50:37.7052985Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7053299Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7053302Z 2025-12-04T12:50:37.7053390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7053454Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7053517Z ======================= 1 failed, 5 deselected in 10.53s ======================= 2025-12-04T12:50:37.7053555Z Got exit code 1 2025-12-04T12:50:37.7053596Z Retrying single test... 2025-12-04T12:50:37.7053825Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6e358ba8fbc77e3f.xml 2025-12-04T12:50:37.7053884Z ============================= test session starts ============================== 2025-12-04T12:50:37.7053997Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7054038Z cachedir: .pytest_cache 2025-12-04T12:50:37.7054217Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7054265Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7054306Z configfile: pytest.ini 2025-12-04T12:50:37.7054469Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7054540Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.7054844Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7054887Z Running 1 items in this shard 2025-12-04T12:50:37.7054889Z 2025-12-04T12:50:37.7055273Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 12:48:56.811000 404497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 404566 2025-12-04T12:50:37.7055452Z I1204 12:48:56.812000 404497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 404567 2025-12-04T12:50:37.7055605Z I1204 12:48:56.812000 404497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 404568 2025-12-04T12:50:37.7055756Z I1204 12:48:56.813000 404497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 404569 2025-12-04T12:50:37.7056439Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7056486Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7057156Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7057199Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7057868Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7057911Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7058578Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7058620Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7059141Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7059195Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7059683Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7059773Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7060261Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7060340Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7060826Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7060872Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7061552Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7061595Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7062261Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7062304Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7062794Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7062856Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7063341Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7063399Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7064102Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7064146Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7064816Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7064868Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7065375Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7065434Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7065915Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7065975Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7066213Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7066259Z local_shape = tensor.shape 2025-12-04T12:50:37.7066493Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7066531Z tensor.shape, 2025-12-04T12:50:37.7066763Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7066801Z tensor.dtype, 2025-12-04T12:50:37.7067035Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7067079Z local_shape = tensor.shape 2025-12-04T12:50:37.7067311Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7067347Z tensor.shape, 2025-12-04T12:50:37.7067577Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7067614Z tensor.dtype, 2025-12-04T12:50:37.7067845Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7067886Z local_shape = tensor.shape 2025-12-04T12:50:37.7068140Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7068178Z tensor.shape, 2025-12-04T12:50:37.7068409Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7068445Z tensor.dtype, 2025-12-04T12:50:37.7068678Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7068719Z local_shape = tensor.shape 2025-12-04T12:50:37.7068952Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7068989Z tensor.shape, 2025-12-04T12:50:37.7069220Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7069279Z tensor.dtype, 2025-12-04T12:50:37.7069415Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7069572Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7069890Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7070039Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7070319Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7070438Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7070707Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7070850Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7071121Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7071261Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7071533Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7071663Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7071933Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7072073Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7072633Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3292528640. 2025-12-04T12:50:37.7072746Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7072935Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7073367Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7073475Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7073696Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7073872Z E1204 12:49:06.143000 404567 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7074004Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7074157Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7074436Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7074583Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7074862Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7074979Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7075247Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7075388Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7075657Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7075801Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7076073Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7076201Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7076472Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7076611Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7077158Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3292528640. 2025-12-04T12:50:37.7077268Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7077457Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7077889Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7078007Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7078224Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7078382Z E1204 12:49:06.148000 404568 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7078513Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7078664Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7078943Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7079093Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7079369Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7079484Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7079783Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7079925Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7080194Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7080337Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7080607Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7080735Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7081006Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7081171Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7081700Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3678404608. 2025-12-04T12:50:37.7081807Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7081996Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7082429Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7082567Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7082771Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7082930Z E1204 12:49:06.153000 404566 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7083061Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7083213Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7083498Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7083645Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7083922Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7084038Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7084305Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7084449Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7084717Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7084857Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7085128Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7085256Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7085546Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7085689Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7086213Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 803209216 and is now 3292528640. 2025-12-04T12:50:37.7086320Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7086510Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7086950Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7087070Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7087272Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7087429Z E1204 12:49:06.235000 404569 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7087472Z FAILED [10.6157s] [100%] 2025-12-04T12:50:37.7087474Z 2025-12-04T12:50:37.7087531Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7087691Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.7087739Z Traceback (most recent call last): 2025-12-04T12:50:37.7087904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7087948Z self._join_processes(fn) 2025-12-04T12:50:37.7088123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7088177Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7088356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7088400Z raise RuntimeError(error) 2025-12-04T12:50:37.7088480Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7088529Z Traceback (most recent call last): 2025-12-04T12:50:37.7088691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7088734Z getattr(self, test_name)() 2025-12-04T12:50:37.7088893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7088929Z fn() 2025-12-04T12:50:37.7089080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7089122Z method(*args, **kwargs) 2025-12-04T12:50:37.7089273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7089314Z method(*args, **kwargs) 2025-12-04T12:50:37.7089483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7089523Z with policy(): 2025-12-04T12:50:37.7089677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7089759Z raise RuntimeError(msg) 2025-12-04T12:50:37.7090169Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3292528640. 2025-12-04T12:50:37.7090172Z 2025-12-04T12:50:37.7090248Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7090558Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7090577Z 2025-12-04T12:50:37.7090680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7090682Z 2025-12-04T12:50:37.7090684Z 2025-12-04T12:50:37.7090761Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7090849Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7091121Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6e358ba8fbc77e3f.xml - 2025-12-04T12:50:37.7091182Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7091509Z FAILED [10.6157s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7091558Z Traceback (most recent call last): 2025-12-04T12:50:37.7091724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7091767Z getattr(self, test_name)() 2025-12-04T12:50:37.7091929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7091965Z fn() 2025-12-04T12:50:37.7092117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7092159Z method(*args, **kwargs) 2025-12-04T12:50:37.7092309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7092349Z method(*args, **kwargs) 2025-12-04T12:50:37.7092500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7092539Z with policy(): 2025-12-04T12:50:37.7092692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7092734Z raise RuntimeError(msg) 2025-12-04T12:50:37.7093142Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3292528640. 2025-12-04T12:50:37.7093144Z 2025-12-04T12:50:37.7093220Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7093531Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7093561Z 2025-12-04T12:50:37.7093650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7093714Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7093775Z ======================= 1 failed, 7 deselected in 10.63s ======================= 2025-12-04T12:50:37.7093815Z Got exit code 1 2025-12-04T12:50:37.7093855Z Retrying single test... 2025-12-04T12:50:37.7094082Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-da6a5b5746f06e70.xml 2025-12-04T12:50:37.7094139Z ============================= test session starts ============================== 2025-12-04T12:50:37.7094252Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7094293Z cachedir: .pytest_cache 2025-12-04T12:50:37.7094452Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7094518Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7094560Z configfile: pytest.ini 2025-12-04T12:50:37.7094721Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7094793Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.7095096Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7095141Z Running 1 items in this shard 2025-12-04T12:50:37.7095143Z 2025-12-04T12:50:37.7095523Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 12:49:10.046000 405035 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 405104 2025-12-04T12:50:37.7095681Z I1204 12:49:10.046000 405035 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 405105 2025-12-04T12:50:37.7095834Z I1204 12:49:10.047000 405035 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 405106 2025-12-04T12:50:37.7095985Z I1204 12:49:10.048000 405035 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 405107 2025-12-04T12:50:37.7096668Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7096713Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7097383Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7097427Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7098118Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7098163Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7098826Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7098869Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7099367Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7099437Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7099967Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7100015Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7100506Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7100554Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7101037Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7101084Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7101758Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7101802Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7102468Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7102510Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7103022Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7103085Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7103569Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7103628Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7104302Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7104373Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7105039Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7105085Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7105574Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7105631Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7106114Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7106173Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T12:50:37.7106411Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7106455Z local_shape = tensor.shape 2025-12-04T12:50:37.7106689Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7106727Z tensor.shape, 2025-12-04T12:50:37.7106960Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7106996Z tensor.dtype, 2025-12-04T12:50:37.7107228Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7107293Z local_shape = tensor.shape 2025-12-04T12:50:37.7107526Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7107564Z tensor.shape, 2025-12-04T12:50:37.7107794Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7107832Z tensor.dtype, 2025-12-04T12:50:37.7108063Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7108106Z local_shape = tensor.shape 2025-12-04T12:50:37.7108338Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7108388Z tensor.shape, 2025-12-04T12:50:37.7108632Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7108669Z tensor.dtype, 2025-12-04T12:50:37.7108899Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7108943Z local_shape = tensor.shape 2025-12-04T12:50:37.7109175Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7109211Z tensor.shape, 2025-12-04T12:50:37.7109444Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T12:50:37.7109482Z tensor.dtype, 2025-12-04T12:50:37.7109620Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7109816Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7110100Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7110247Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7110530Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7110649Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7110920Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7111062Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7111333Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7111474Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7111771Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7111903Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7112173Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7112315Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7112846Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3800039424. 2025-12-04T12:50:37.7112981Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7113171Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7113601Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7113709Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7113916Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7114079Z E1204 12:49:19.240000 405104 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7114210Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7114362Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7114642Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7114787Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7115069Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7115184Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7115454Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7115594Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7115863Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7116030Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7116301Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7116431Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7116700Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7116841Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7117371Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3449815040. 2025-12-04T12:50:37.7117501Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7117690Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7118119Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7118230Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7118433Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7118594Z E1204 12:49:19.243000 405107 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7118725Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7118878Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7119157Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7119305Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7119584Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7119745Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7120014Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7120155Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7120452Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7120594Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7120862Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7120991Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7121260Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7121403Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7121953Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3460300800. 2025-12-04T12:50:37.7122062Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7122251Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7122680Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7122789Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7122991Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7123153Z E1204 12:49:19.263000 405105 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7123283Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7123437Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7123716Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7123864Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7124142Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7124257Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7124526Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7124685Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7124956Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7125095Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7125363Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7125493Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7125764Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7125926Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7126454Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3292528640. 2025-12-04T12:50:37.7126564Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7126753Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7127189Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7127297Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7127499Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7127657Z E1204 12:49:19.782000 405106 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7127698Z FAILED [10.5151s] [100%] 2025-12-04T12:50:37.7127700Z 2025-12-04T12:50:37.7127757Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7127918Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _ 2025-12-04T12:50:37.7127967Z Traceback (most recent call last): 2025-12-04T12:50:37.7128131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7128176Z self._join_processes(fn) 2025-12-04T12:50:37.7128349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7128403Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7128582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7128626Z raise RuntimeError(error) 2025-12-04T12:50:37.7128707Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.7128775Z Traceback (most recent call last): 2025-12-04T12:50:37.7128937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7128982Z getattr(self, test_name)() 2025-12-04T12:50:37.7129141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7129175Z fn() 2025-12-04T12:50:37.7129328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7129369Z method(*args, **kwargs) 2025-12-04T12:50:37.7129520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7129560Z method(*args, **kwargs) 2025-12-04T12:50:37.7129750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7129803Z with policy(): 2025-12-04T12:50:37.7129980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7130023Z raise RuntimeError(msg) 2025-12-04T12:50:37.7130438Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3800039424. 2025-12-04T12:50:37.7130441Z 2025-12-04T12:50:37.7130517Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7130834Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7130837Z 2025-12-04T12:50:37.7130933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7130936Z 2025-12-04T12:50:37.7130996Z Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7131046Z Traceback (most recent call last): 2025-12-04T12:50:37.7133653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7133699Z getattr(self, test_name)() 2025-12-04T12:50:37.7133861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7133898Z fn() 2025-12-04T12:50:37.7134052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7134094Z method(*args, **kwargs) 2025-12-04T12:50:37.7134248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7134293Z method(*args, **kwargs) 2025-12-04T12:50:37.7134443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7134483Z with policy(): 2025-12-04T12:50:37.7134635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7134678Z raise RuntimeError(msg) 2025-12-04T12:50:37.7135091Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3460300800. 2025-12-04T12:50:37.7135093Z 2025-12-04T12:50:37.7135171Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7135527Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7135532Z 2025-12-04T12:50:37.7135620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7135622Z 2025-12-04T12:50:37.7135683Z Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7135729Z Traceback (most recent call last): 2025-12-04T12:50:37.7135893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7135935Z getattr(self, test_name)() 2025-12-04T12:50:37.7136095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7136130Z fn() 2025-12-04T12:50:37.7136283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7136346Z method(*args, **kwargs) 2025-12-04T12:50:37.7136496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7136535Z method(*args, **kwargs) 2025-12-04T12:50:37.7136686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7136723Z with policy(): 2025-12-04T12:50:37.7136874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7136915Z raise RuntimeError(msg) 2025-12-04T12:50:37.7137326Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3449815040. 2025-12-04T12:50:37.7137330Z 2025-12-04T12:50:37.7137405Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7137714Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7137716Z 2025-12-04T12:50:37.7137803Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7137805Z 2025-12-04T12:50:37.7137807Z 2025-12-04T12:50:37.7137884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7137973Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7138247Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-da6a5b5746f06e70.xml - 2025-12-04T12:50:37.7138312Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7138639Z FAILED [10.5151s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.7138687Z Traceback (most recent call last): 2025-12-04T12:50:37.7138851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7138894Z getattr(self, test_name)() 2025-12-04T12:50:37.7139054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7139088Z fn() 2025-12-04T12:50:37.7139263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7139305Z method(*args, **kwargs) 2025-12-04T12:50:37.7139457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7139497Z method(*args, **kwargs) 2025-12-04T12:50:37.7139649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7139685Z with policy(): 2025-12-04T12:50:37.7139881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7139922Z raise RuntimeError(msg) 2025-12-04T12:50:37.7140333Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3800039424. 2025-12-04T12:50:37.7140366Z 2025-12-04T12:50:37.7140440Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7140750Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7140752Z 2025-12-04T12:50:37.7140839Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7140841Z 2025-12-04T12:50:37.7140899Z Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7140946Z Traceback (most recent call last): 2025-12-04T12:50:37.7141107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7141150Z getattr(self, test_name)() 2025-12-04T12:50:37.7141311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7141347Z fn() 2025-12-04T12:50:37.7141498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7141539Z method(*args, **kwargs) 2025-12-04T12:50:37.7141687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7141727Z method(*args, **kwargs) 2025-12-04T12:50:37.7141877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7141915Z with policy(): 2025-12-04T12:50:37.7142065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7142108Z raise RuntimeError(msg) 2025-12-04T12:50:37.7142514Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3460300800. 2025-12-04T12:50:37.7142519Z 2025-12-04T12:50:37.7142592Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7142901Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7142903Z 2025-12-04T12:50:37.7142988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7142990Z 2025-12-04T12:50:37.7143048Z Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7143094Z Traceback (most recent call last): 2025-12-04T12:50:37.7143283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7143327Z getattr(self, test_name)() 2025-12-04T12:50:37.7143486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7143520Z fn() 2025-12-04T12:50:37.7143671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7143711Z method(*args, **kwargs) 2025-12-04T12:50:37.7143861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7143900Z method(*args, **kwargs) 2025-12-04T12:50:37.7144050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7144098Z with policy(): 2025-12-04T12:50:37.7144253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7144309Z raise RuntimeError(msg) 2025-12-04T12:50:37.7144715Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3449815040. 2025-12-04T12:50:37.7144718Z 2025-12-04T12:50:37.7144791Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7145098Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7145101Z 2025-12-04T12:50:37.7145189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7145253Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7145317Z ======================= 1 failed, 7 deselected in 10.52s ======================= 2025-12-04T12:50:37.7145354Z Got exit code 1 2025-12-04T12:50:37.7145614Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T12:50:37.7145744Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.7145974Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2c01093c2d23902.xml 2025-12-04T12:50:37.7146032Z ============================= test session starts ============================== 2025-12-04T12:50:37.7146149Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7146192Z cachedir: .pytest_cache 2025-12-04T12:50:37.7146350Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7146399Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7146440Z configfile: pytest.ini 2025-12-04T12:50:37.7146605Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7146678Z collecting ... collected 8 items / 6 deselected / 2 selected 2025-12-04T12:50:37.7146731Z stepcurrent: skipping 6 already run items. 2025-12-04T12:50:37.7146775Z Running 2 items in this shard 2025-12-04T12:50:37.7146777Z 2025-12-04T12:50:37.7147135Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 12:49:23.255000 405573 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 405642 2025-12-04T12:50:37.7147292Z I1204 12:49:23.256000 405573 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 405643 2025-12-04T12:50:37.7147446Z I1204 12:49:23.257000 405573 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 405644 2025-12-04T12:50:37.7147597Z I1204 12:49:23.257000 405573 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 405645 2025-12-04T12:50:37.7148288Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7148361Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7149030Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7149074Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7149785Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7149829Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7150494Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7150535Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7151040Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7151092Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7151581Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7151630Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7152144Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7152194Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7152680Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7152728Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7152864Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7153034Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7153338Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7153487Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7153770Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7153887Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7154161Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7154305Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7154574Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7154716Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7154983Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7155115Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7155386Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7155528Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7156014Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 803209216 and is now 3235905536. 2025-12-04T12:50:37.7156125Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7156339Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7156721Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7156831Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7157034Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7157193Z E1204 12:49:31.541000 405645 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7157324Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7157500Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7157780Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7157926Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7158204Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7158318Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7158590Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7158730Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7158999Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7159138Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7159405Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7159537Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7159851Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7159993Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7160473Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3300917248. 2025-12-04T12:50:37.7160610Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7160802Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7161180Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7161286Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7161488Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7161648Z E1204 12:49:31.560000 405644 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7161793Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7161960Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7162240Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7162387Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7162667Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7162784Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7163054Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7163193Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7163460Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7163599Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7163869Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7164000Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7164271Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7164413Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7164920Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408. 2025-12-04T12:50:37.7165030Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7165219Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7165597Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7165704Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7165906Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7166078Z E1204 12:49:32.013000 405642 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7166219Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7166372Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7166652Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7166799Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7167079Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7167197Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7167468Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7167607Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7167876Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7168014Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7168285Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7168414Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7168684Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7168824Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7169325Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7169435Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7169623Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7170053Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7170160Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7170365Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7170551Z E1204 12:49:32.015000 405643 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7170591Z FAILED [9.6168s] [ 50%] 2025-12-04T12:50:37.7170593Z 2025-12-04T12:50:37.7170653Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7170762Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___ 2025-12-04T12:50:37.7170810Z Traceback (most recent call last): 2025-12-04T12:50:37.7170973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7171017Z self._join_processes(fn) 2025-12-04T12:50:37.7171190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7171247Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7171425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7171470Z raise RuntimeError(error) 2025-12-04T12:50:37.7171550Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7171597Z Traceback (most recent call last): 2025-12-04T12:50:37.7171757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7171802Z getattr(self, test_name)() 2025-12-04T12:50:37.7171959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7171995Z fn() 2025-12-04T12:50:37.7172147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7172192Z method(*args, **kwargs) 2025-12-04T12:50:37.7172343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7172385Z method(*args, **kwargs) 2025-12-04T12:50:37.7172535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7172572Z with policy(): 2025-12-04T12:50:37.7172726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7172771Z raise RuntimeError(msg) 2025-12-04T12:50:37.7173130Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 803209216 and is now 3235905536. 2025-12-04T12:50:37.7173134Z 2025-12-04T12:50:37.7173236Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7173496Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7173499Z 2025-12-04T12:50:37.7173588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7173591Z 2025-12-04T12:50:37.7173592Z 2025-12-04T12:50:37.7173668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7173757Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7174032Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2c01093c2d23902.xml - 2025-12-04T12:50:37.7174093Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7174379Z FAILED [9.6168s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T12:50:37.7174438Z Traceback (most recent call last): 2025-12-04T12:50:37.7174603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7174646Z getattr(self, test_name)() 2025-12-04T12:50:37.7174806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7174841Z fn() 2025-12-04T12:50:37.7174995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7175035Z method(*args, **kwargs) 2025-12-04T12:50:37.7175188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7175229Z method(*args, **kwargs) 2025-12-04T12:50:37.7175379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7175417Z with policy(): 2025-12-04T12:50:37.7175569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7175610Z raise RuntimeError(msg) 2025-12-04T12:50:37.7175969Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 803209216 and is now 3235905536. 2025-12-04T12:50:37.7175972Z 2025-12-04T12:50:37.7176047Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7176312Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7176315Z 2025-12-04T12:50:37.7176402Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7176467Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7176531Z ======================= 1 failed, 6 deselected in 9.63s ======================== 2025-12-04T12:50:37.7176568Z Got exit code 1 2025-12-04T12:50:37.7176610Z Retrying single test... 2025-12-04T12:50:37.7176837Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d3c3cb956b43fa21.xml 2025-12-04T12:50:37.7176895Z ============================= test session starts ============================== 2025-12-04T12:50:37.7177031Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7177075Z cachedir: .pytest_cache 2025-12-04T12:50:37.7177232Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7177280Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7177321Z configfile: pytest.ini 2025-12-04T12:50:37.7177485Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7177557Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.7177810Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7177855Z Running 1 items in this shard 2025-12-04T12:50:37.7177858Z 2025-12-04T12:50:37.7178193Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 12:49:35.511000 406043 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 406112 2025-12-04T12:50:37.7178376Z I1204 12:49:35.511000 406043 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 406113 2025-12-04T12:50:37.7178528Z I1204 12:49:35.512000 406043 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 406114 2025-12-04T12:50:37.7178679Z I1204 12:49:35.512000 406043 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 406115 2025-12-04T12:50:37.7179361Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7179408Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7180130Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7180173Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7180840Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7180884Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7181551Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7181593Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7182126Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7182178Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7182668Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7182717Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7183204Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7183278Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7183765Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7183811Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7183948Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7184104Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7184388Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7184535Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7184815Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7184932Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7185203Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7185345Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7185613Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7185754Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7186023Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7186174Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7186446Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7186586Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7187068Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7187179Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7187379Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7187770Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7187878Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7188083Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7188241Z E1204 12:49:44.199000 406113 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7188374Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7188525Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7188808Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7188953Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7189231Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7189348Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7189623Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7189801Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7190071Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7190211Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7190503Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7190636Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7190906Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7191046Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7191523Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7191646Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7191857Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7192234Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7192342Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7192543Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7192703Z E1204 12:49:44.201000 406114 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7192834Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7192987Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7193265Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7193411Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7193690Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7193806Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7194077Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7194216Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7194484Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7194622Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7194913Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7195043Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7195311Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7195452Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7195928Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408. 2025-12-04T12:50:37.7196062Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7196252Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7196632Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7196740Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7196944Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7197103Z E1204 12:49:44.251000 406112 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7197232Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7197385Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7197663Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7197809Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7198087Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7198204Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7198472Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7198612Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7198880Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7199038Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7199309Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7199436Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7199761Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7199903Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7200379Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 950009856 and is now 3051356160. 2025-12-04T12:50:37.7200520Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7200707Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7201088Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7201194Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7201400Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7201557Z E1204 12:49:44.268000 406115 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7201596Z FAILED [9.9132s] [100%] 2025-12-04T12:50:37.7201598Z 2025-12-04T12:50:37.7201654Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7201763Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___ 2025-12-04T12:50:37.7201811Z Traceback (most recent call last): 2025-12-04T12:50:37.7201973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7202019Z self._join_processes(fn) 2025-12-04T12:50:37.7202193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7202249Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7202428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7202473Z raise RuntimeError(error) 2025-12-04T12:50:37.7202553Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7202601Z Traceback (most recent call last): 2025-12-04T12:50:37.7202761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7202805Z getattr(self, test_name)() 2025-12-04T12:50:37.7202963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7202999Z fn() 2025-12-04T12:50:37.7203311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7203355Z method(*args, **kwargs) 2025-12-04T12:50:37.7203505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7203547Z method(*args, **kwargs) 2025-12-04T12:50:37.7203698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7203735Z with policy(): 2025-12-04T12:50:37.7203887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7203928Z raise RuntimeError(msg) 2025-12-04T12:50:37.7204287Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7204311Z 2025-12-04T12:50:37.7204387Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7204645Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7204648Z 2025-12-04T12:50:37.7204736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7204738Z 2025-12-04T12:50:37.7204740Z 2025-12-04T12:50:37.7204815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7204903Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7205175Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d3c3cb956b43fa21.xml - 2025-12-04T12:50:37.7205239Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7205515Z FAILED [9.9132s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7205562Z Traceback (most recent call last): 2025-12-04T12:50:37.7205726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7205770Z getattr(self, test_name)() 2025-12-04T12:50:37.7205928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7205963Z fn() 2025-12-04T12:50:37.7206113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7206155Z method(*args, **kwargs) 2025-12-04T12:50:37.7206306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7206348Z method(*args, **kwargs) 2025-12-04T12:50:37.7206496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7206534Z with policy(): 2025-12-04T12:50:37.7206685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7206726Z raise RuntimeError(msg) 2025-12-04T12:50:37.7207085Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7207088Z 2025-12-04T12:50:37.7207185Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7207446Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7207448Z 2025-12-04T12:50:37.7207535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7207597Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7207659Z ======================= 1 failed, 7 deselected in 9.92s ======================== 2025-12-04T12:50:37.7207697Z Got exit code 1 2025-12-04T12:50:37.7207737Z Retrying single test... 2025-12-04T12:50:37.7207967Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b9941c365edcbf43.xml 2025-12-04T12:50:37.7208023Z ============================= test session starts ============================== 2025-12-04T12:50:37.7208148Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7208200Z cachedir: .pytest_cache 2025-12-04T12:50:37.7208357Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7208402Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7208444Z configfile: pytest.ini 2025-12-04T12:50:37.7208606Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7208680Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.7208933Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7208976Z Running 1 items in this shard 2025-12-04T12:50:37.7208979Z 2025-12-04T12:50:37.7209315Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 12:49:47.955000 406513 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 406582 2025-12-04T12:50:37.7209470Z I1204 12:49:47.956000 406513 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 406583 2025-12-04T12:50:37.7209622Z I1204 12:49:47.957000 406513 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 406584 2025-12-04T12:50:37.7209892Z I1204 12:49:47.957000 406513 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 406585 2025-12-04T12:50:37.7210576Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7210621Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7211289Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7211332Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7212023Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7212068Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7212732Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7212795Z FSDP.set_state_dict_type( 2025-12-04T12:50:37.7213309Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7213357Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7213848Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7213897Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7214387Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7214435Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7214920Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T12:50:37.7214967Z device = _get_pg_default_device(group) 2025-12-04T12:50:37.7215105Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7215262Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7215549Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7215695Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7215975Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7216110Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7216384Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7216524Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7216794Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7216934Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7217204Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7217355Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7217627Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7217768Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7218249Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7218361Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7218553Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7218933Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7219042Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7219246Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7219406Z E1204 12:49:56.292000 406584 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7219540Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7219745Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7220025Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7220173Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7220450Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7220590Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7220861Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7221001Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7221269Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7221407Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7221676Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7221830Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7222101Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7222243Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7222722Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7222833Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7223022Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7223400Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7223507Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7223710Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7223869Z E1204 12:49:56.297000 406583 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7223999Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7224151Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7224427Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7224575Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7224892Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7225011Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7225280Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7225420Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7225688Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7225828Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7226107Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7226247Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7226518Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7226659Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7227142Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408. 2025-12-04T12:50:37.7227251Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7227439Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7227816Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7227922Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7228128Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7228287Z E1204 12:49:56.306000 406582 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7228416Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7228569Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7228847Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7228993Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7229298Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7229416Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7229684Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7229858Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7230125Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7230277Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7230561Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7230690Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7230960Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7231099Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7231581Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7231690Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7231878Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7232255Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7232363Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7232567Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7232723Z E1204 12:49:56.363000 406585 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7232764Z FAILED [9.6153s] [100%] 2025-12-04T12:50:37.7232766Z 2025-12-04T12:50:37.7232821Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7232931Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___ 2025-12-04T12:50:37.7232979Z Traceback (most recent call last): 2025-12-04T12:50:37.7233141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7233186Z self._join_processes(fn) 2025-12-04T12:50:37.7233383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7233439Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7233617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7233662Z raise RuntimeError(error) 2025-12-04T12:50:37.7233742Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.7233789Z Traceback (most recent call last): 2025-12-04T12:50:37.7233949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7233992Z getattr(self, test_name)() 2025-12-04T12:50:37.7234149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7234195Z fn() 2025-12-04T12:50:37.7234346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7234397Z method(*args, **kwargs) 2025-12-04T12:50:37.7234547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7234588Z method(*args, **kwargs) 2025-12-04T12:50:37.7234737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7234775Z with policy(): 2025-12-04T12:50:37.7234926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7234969Z raise RuntimeError(msg) 2025-12-04T12:50:37.7235329Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408. 2025-12-04T12:50:37.7235334Z 2025-12-04T12:50:37.7235410Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7235667Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7235670Z 2025-12-04T12:50:37.7235756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7235759Z 2025-12-04T12:50:37.7235818Z Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7235863Z Traceback (most recent call last): 2025-12-04T12:50:37.7236026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7236067Z getattr(self, test_name)() 2025-12-04T12:50:37.7236228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7236264Z fn() 2025-12-04T12:50:37.7236414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7236454Z method(*args, **kwargs) 2025-12-04T12:50:37.7236604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7236643Z method(*args, **kwargs) 2025-12-04T12:50:37.7236793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7236830Z with policy(): 2025-12-04T12:50:37.7236981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7237021Z raise RuntimeError(msg) 2025-12-04T12:50:37.7237403Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7237407Z 2025-12-04T12:50:37.7237481Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7237737Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7237739Z 2025-12-04T12:50:37.7237826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7237828Z 2025-12-04T12:50:37.7237886Z Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.7237933Z Traceback (most recent call last): 2025-12-04T12:50:37.7238106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7238160Z getattr(self, test_name)() 2025-12-04T12:50:37.7238320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7238355Z fn() 2025-12-04T12:50:37.7238504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7238547Z method(*args, **kwargs) 2025-12-04T12:50:37.7238695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7238736Z method(*args, **kwargs) 2025-12-04T12:50:37.7238885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7238921Z with policy(): 2025-12-04T12:50:37.7239074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7239116Z raise RuntimeError(msg) 2025-12-04T12:50:37.7239474Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7239476Z 2025-12-04T12:50:37.7239549Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7239848Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7239850Z 2025-12-04T12:50:37.7239936Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7239939Z 2025-12-04T12:50:37.7239941Z 2025-12-04T12:50:37.7240019Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7240109Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7240380Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b9941c365edcbf43.xml - 2025-12-04T12:50:37.7240443Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7240717Z FAILED [9.6153s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.7240765Z Traceback (most recent call last): 2025-12-04T12:50:37.7240928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7241005Z getattr(self, test_name)() 2025-12-04T12:50:37.7241167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7241202Z fn() 2025-12-04T12:50:37.7241352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7241393Z method(*args, **kwargs) 2025-12-04T12:50:37.7241542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7241583Z method(*args, **kwargs) 2025-12-04T12:50:37.7241732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7241769Z with policy(): 2025-12-04T12:50:37.7241920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7241978Z raise RuntimeError(msg) 2025-12-04T12:50:37.7242349Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408. 2025-12-04T12:50:37.7242353Z 2025-12-04T12:50:37.7242427Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7242685Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7242687Z 2025-12-04T12:50:37.7242772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7242774Z 2025-12-04T12:50:37.7242833Z Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7242878Z Traceback (most recent call last): 2025-12-04T12:50:37.7243044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7243086Z getattr(self, test_name)() 2025-12-04T12:50:37.7243245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7243279Z fn() 2025-12-04T12:50:37.7243429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7243468Z method(*args, **kwargs) 2025-12-04T12:50:37.7243618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7243658Z method(*args, **kwargs) 2025-12-04T12:50:37.7243808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7243847Z with policy(): 2025-12-04T12:50:37.7243997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7244041Z raise RuntimeError(msg) 2025-12-04T12:50:37.7244398Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7244400Z 2025-12-04T12:50:37.7244474Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7244730Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7244732Z 2025-12-04T12:50:37.7244820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7244855Z 2025-12-04T12:50:37.7244913Z Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.7244960Z Traceback (most recent call last): 2025-12-04T12:50:37.7245122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7245165Z getattr(self, test_name)() 2025-12-04T12:50:37.7245324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7245359Z fn() 2025-12-04T12:50:37.7245508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7245548Z method(*args, **kwargs) 2025-12-04T12:50:37.7245699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7245754Z method(*args, **kwargs) 2025-12-04T12:50:37.7245905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7245955Z with policy(): 2025-12-04T12:50:37.7246106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7246146Z raise RuntimeError(msg) 2025-12-04T12:50:37.7246503Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3051356160. 2025-12-04T12:50:37.7246505Z 2025-12-04T12:50:37.7246577Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7246836Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7246840Z 2025-12-04T12:50:37.7246925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7246990Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7247052Z ======================= 1 failed, 7 deselected in 9.63s ======================== 2025-12-04T12:50:37.7247091Z Got exit code 1 2025-12-04T12:50:37.7247303Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda 2025-12-04T12:50:37.7247432Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.7247660Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b2b248bb762b14d.xml 2025-12-04T12:50:37.7247720Z ============================= test session starts ============================== 2025-12-04T12:50:37.7247833Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7247874Z cachedir: .pytest_cache 2025-12-04T12:50:37.7248031Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7248077Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7248118Z configfile: pytest.ini 2025-12-04T12:50:37.7248280Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7248352Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.7248405Z stepcurrent: skipping 7 already run items. 2025-12-04T12:50:37.7248449Z Running 1 items in this shard 2025-12-04T12:50:37.7248451Z 2025-12-04T12:50:37.7248803Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 12:50:00.145000 406983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 407052 2025-12-04T12:50:37.7248962Z I1204 12:50:00.145000 406983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 407053 2025-12-04T12:50:37.7249113Z I1204 12:50:00.146000 406983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 407054 2025-12-04T12:50:37.7249265Z I1204 12:50:00.147000 406983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 407055 2025-12-04T12:50:37.7250385Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7250542Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7251602Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7251727Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7252787Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7252909Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7253992Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7254114Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7254831Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7254937Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7255656Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7255747Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7256451Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7256541Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7257243Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7257335Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7258035Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7258099Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7258820Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7258883Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7259584Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7259642Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7260403Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7260494Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7260629Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7260785Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7261069Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7261218Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7261498Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7261615Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7261885Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7262027Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7262300Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7262441Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7262711Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7262839Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7263135Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7263278Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7263753Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7263864Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7264052Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7264433Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7264563Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7264771Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7264932Z E1204 12:50:08.482000 407054 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7265062Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7265216Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7265498Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7265645Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7265921Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7266037Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7266305Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7266449Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7266720Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7266859Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7267130Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7267257Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7267550Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7267692Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7268161Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7268270Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7268458Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7268847Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7268966Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7269170Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7269328Z E1204 12:50:08.482000 407053 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7269459Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7269613Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7269940Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7270086Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7270363Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7270477Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7270747Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7270890Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7271161Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7271300Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7271570Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7271698Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7272238Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7272381Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7272848Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 803209216 and is now 3047161856. 2025-12-04T12:50:37.7272956Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7273147Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7273549Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7273655Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7273858Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7274015Z E1204 12:50:08.502000 407055 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7274145Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7274299Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7274579Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7274726Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7275002Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7275117Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7275385Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7275527Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7275795Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7275936Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7276206Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7276362Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7276636Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7276776Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7277242Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408. 2025-12-04T12:50:37.7277349Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7277557Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7277943Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7278049Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7278253Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7278410Z E1204 12:50:08.575000 407052 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7278451Z FAILED [9.6141s] [100%] 2025-12-04T12:50:37.7278455Z 2025-12-04T12:50:37.7278511Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7278624Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____ 2025-12-04T12:50:37.7278670Z Traceback (most recent call last): 2025-12-04T12:50:37.7278833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7278878Z self._join_processes(fn) 2025-12-04T12:50:37.7279052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7279106Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7279285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7279329Z raise RuntimeError(error) 2025-12-04T12:50:37.7279412Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7279459Z Traceback (most recent call last): 2025-12-04T12:50:37.7279620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7279663Z getattr(self, test_name)() 2025-12-04T12:50:37.7279859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7279895Z fn() 2025-12-04T12:50:37.7280045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7280087Z method(*args, **kwargs) 2025-12-04T12:50:37.7280237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7280278Z method(*args, **kwargs) 2025-12-04T12:50:37.7280452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7280492Z with policy(): 2025-12-04T12:50:37.7280644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7280685Z raise RuntimeError(msg) 2025-12-04T12:50:37.7281037Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7281040Z 2025-12-04T12:50:37.7281118Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7281371Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7281390Z 2025-12-04T12:50:37.7281492Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7281494Z 2025-12-04T12:50:37.7281496Z 2025-12-04T12:50:37.7281571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7281659Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7281932Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b2b248bb762b14d.xml - 2025-12-04T12:50:37.7281993Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7282266Z FAILED [9.6141s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T12:50:37.7282315Z Traceback (most recent call last): 2025-12-04T12:50:37.7282479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7282523Z getattr(self, test_name)() 2025-12-04T12:50:37.7282684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7282718Z fn() 2025-12-04T12:50:37.7282871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7282911Z method(*args, **kwargs) 2025-12-04T12:50:37.7283061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7283101Z method(*args, **kwargs) 2025-12-04T12:50:37.7283253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7283293Z with policy(): 2025-12-04T12:50:37.7283446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7283489Z raise RuntimeError(msg) 2025-12-04T12:50:37.7283840Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7283842Z 2025-12-04T12:50:37.7283917Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7284174Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7284176Z 2025-12-04T12:50:37.7284283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7284347Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7284411Z ======================= 1 failed, 7 deselected in 9.62s ======================== 2025-12-04T12:50:37.7284449Z Got exit code 1 2025-12-04T12:50:37.7284490Z Retrying single test... 2025-12-04T12:50:37.7284716Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-2368814337db79d3.xml 2025-12-04T12:50:37.7284774Z ============================= test session starts ============================== 2025-12-04T12:50:37.7284885Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7284927Z cachedir: .pytest_cache 2025-12-04T12:50:37.7285086Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7285145Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7285186Z configfile: pytest.ini 2025-12-04T12:50:37.7285364Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7285438Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.7285688Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7285732Z Running 1 items in this shard 2025-12-04T12:50:37.7285735Z 2025-12-04T12:50:37.7286064Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 12:50:12.344000 407453 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 407522 2025-12-04T12:50:37.7286223Z I1204 12:50:12.345000 407453 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 407523 2025-12-04T12:50:37.7286380Z I1204 12:50:12.345000 407453 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 407524 2025-12-04T12:50:37.7286531Z I1204 12:50:12.346000 407453 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 407525 2025-12-04T12:50:37.7287614Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7287741Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7288823Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7288948Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7290053Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7290203Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7291258Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7291379Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7292093Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7292186Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7292892Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7292984Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7293718Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7293809Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7294508Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7294596Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7295296Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7295382Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7296080Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7296144Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7296842Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7296901Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7297599Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7297660Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7297795Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7297952Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7298233Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7298382Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7298683Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7298800Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7299072Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7299213Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7299482Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7299633Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7299943Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7300072Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7300345Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7300486Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7300959Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3212836864. 2025-12-04T12:50:37.7301071Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7301260Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7301635Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7301744Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7301949Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7302107Z E1204 12:50:20.676000 407522 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7302238Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7302392Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7302674Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7302847Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7303125Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7303240Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7303507Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7303648Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7303917Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7304085Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7304352Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7304480Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7304751Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7304891Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7305361Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7305470Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7305658Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7306033Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7306143Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7306347Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7306504Z E1204 12:50:20.693000 407523 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7306635Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7306786Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7307066Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7307234Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7307510Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7307625Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7307893Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7308033Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7308301Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7308461Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7308729Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7308858Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7309128Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7309271Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7309784Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7309892Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7310081Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7310459Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7310567Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7310770Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7310926Z E1204 12:50:20.702000 407524 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7311057Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7311210Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7311528Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7311676Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7311952Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7312067Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7312333Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7312474Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7312756Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7312910Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7313177Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7313306Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7313577Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7313719Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7314189Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 958398464 and is now 3047161856. 2025-12-04T12:50:37.7314296Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7314485Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7314859Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7314968Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7315173Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7315330Z E1204 12:50:20.758000 407525 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7315371Z FAILED [9.6166s] [100%] 2025-12-04T12:50:37.7315373Z 2025-12-04T12:50:37.7315428Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7315539Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____ 2025-12-04T12:50:37.7315607Z Traceback (most recent call last): 2025-12-04T12:50:37.7315772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7315817Z self._join_processes(fn) 2025-12-04T12:50:37.7315991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7316045Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7316225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7316269Z raise RuntimeError(error) 2025-12-04T12:50:37.7316350Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.7316395Z Traceback (most recent call last): 2025-12-04T12:50:37.7316556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7316612Z getattr(self, test_name)() 2025-12-04T12:50:37.7316781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7316816Z fn() 2025-12-04T12:50:37.7316968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7317011Z method(*args, **kwargs) 2025-12-04T12:50:37.7317161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7317202Z method(*args, **kwargs) 2025-12-04T12:50:37.7317351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7317389Z with policy(): 2025-12-04T12:50:37.7317541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7317584Z raise RuntimeError(msg) 2025-12-04T12:50:37.7317934Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3212836864. 2025-12-04T12:50:37.7317936Z 2025-12-04T12:50:37.7318013Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7318268Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7318271Z 2025-12-04T12:50:37.7318360Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7318362Z 2025-12-04T12:50:37.7318364Z 2025-12-04T12:50:37.7318441Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7318529Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7318799Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-2368814337db79d3.xml - 2025-12-04T12:50:37.7318860Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7319133Z FAILED [9.6166s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T12:50:37.7319179Z Traceback (most recent call last): 2025-12-04T12:50:37.7319345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7319387Z getattr(self, test_name)() 2025-12-04T12:50:37.7319569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7319605Z fn() 2025-12-04T12:50:37.7319790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7319830Z method(*args, **kwargs) 2025-12-04T12:50:37.7319982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7320021Z method(*args, **kwargs) 2025-12-04T12:50:37.7320171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7320208Z with policy(): 2025-12-04T12:50:37.7320359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7320400Z raise RuntimeError(msg) 2025-12-04T12:50:37.7320769Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3212836864. 2025-12-04T12:50:37.7320787Z 2025-12-04T12:50:37.7320863Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7321117Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7321119Z 2025-12-04T12:50:37.7321205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7321267Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7321330Z ======================= 1 failed, 7 deselected in 9.63s ======================== 2025-12-04T12:50:37.7321368Z Got exit code 1 2025-12-04T12:50:37.7321411Z Retrying single test... 2025-12-04T12:50:37.7321638Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c9482025be3f1024.xml 2025-12-04T12:50:37.7321697Z ============================= test session starts ============================== 2025-12-04T12:50:37.7321808Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7321850Z cachedir: .pytest_cache 2025-12-04T12:50:37.7322007Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7322054Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7322094Z configfile: pytest.ini 2025-12-04T12:50:37.7322259Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7322334Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T12:50:37.7322583Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7322628Z Running 1 items in this shard 2025-12-04T12:50:37.7322630Z 2025-12-04T12:50:37.7322959Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 12:50:24.602000 407923 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 407992 2025-12-04T12:50:37.7323115Z I1204 12:50:24.603000 407923 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 407993 2025-12-04T12:50:37.7323267Z I1204 12:50:24.604000 407923 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 407994 2025-12-04T12:50:37.7323442Z I1204 12:50:24.604000 407923 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 407995 2025-12-04T12:50:37.7324520Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7324648Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7325736Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7325860Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7326915Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7327037Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7328094Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T12:50:37.7328214Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T12:50:37.7328943Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7329039Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7329877Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7329997Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7330701Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7330790Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7331494Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7331584Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7332287Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7332351Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7333054Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7333115Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7333837Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7333897Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7334597Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T12:50:37.7334655Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T12:50:37.7334802Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7334968Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7335252Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7335400Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7335678Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7335796Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7336067Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7336208Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7336477Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7336616Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7336886Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7337017Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7337290Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7337430Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7337905Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7338037Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7338229Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7338608Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7338716Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7338920Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7339092Z E1204 12:50:32.792000 407993 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T12:50:37.7339233Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7339384Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7339669Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7339855Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7340134Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7340250Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7340519Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7340660Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7340929Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7341072Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7341343Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7341472Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7341743Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7341883Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7342379Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712. 2025-12-04T12:50:37.7342489Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7342679Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7343053Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7343160Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7343365Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7343541Z E1204 12:50:32.817000 407992 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T12:50:37.7343684Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7343836Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7344115Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7344261Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7344542Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7344658Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7344927Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7345068Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7345335Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7345476Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7345745Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7345874Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7346143Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7346282Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7346772Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7346881Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7347070Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7347443Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7347550Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7347764Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7347933Z E1204 12:50:32.823000 407994 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T12:50:37.7348063Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T12:50:37.7348215Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T12:50:37.7348493Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7348638Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T12:50:37.7348920Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7349038Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T12:50:37.7349305Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7349446Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7349766Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7349909Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T12:50:37.7350177Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7350308Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T12:50:37.7350577Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7350718Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T12:50:37.7351212Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 1256194048 and is now 3047161856. 2025-12-04T12:50:37.7351321Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7351510Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7351883Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7351992Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T12:50:37.7352217Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7352376Z E1204 12:50:32.848000 407995 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T12:50:37.7352416Z FAILED [9.6147s] [100%] 2025-12-04T12:50:37.7352418Z 2025-12-04T12:50:37.7352473Z =================================== FAILURES =================================== 2025-12-04T12:50:37.7352582Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____ 2025-12-04T12:50:37.7352629Z Traceback (most recent call last): 2025-12-04T12:50:37.7352792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T12:50:37.7352836Z self._join_processes(fn) 2025-12-04T12:50:37.7353013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T12:50:37.7353067Z self._check_return_codes(fn, elapsed_time) 2025-12-04T12:50:37.7353249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T12:50:37.7353293Z raise RuntimeError(error) 2025-12-04T12:50:37.7353374Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.7353419Z Traceback (most recent call last): 2025-12-04T12:50:37.7353582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7353624Z getattr(self, test_name)() 2025-12-04T12:50:37.7353784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7353819Z fn() 2025-12-04T12:50:37.7353972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7354014Z method(*args, **kwargs) 2025-12-04T12:50:37.7354165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7354205Z method(*args, **kwargs) 2025-12-04T12:50:37.7354356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7354393Z with policy(): 2025-12-04T12:50:37.7354546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7354588Z raise RuntimeError(msg) 2025-12-04T12:50:37.7354957Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7354962Z 2025-12-04T12:50:37.7355039Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7355294Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7355296Z 2025-12-04T12:50:37.7355383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7355385Z 2025-12-04T12:50:37.7355387Z 2025-12-04T12:50:37.7355461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:50:37.7355549Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T12:50:37.7355823Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c9482025be3f1024.xml - 2025-12-04T12:50:37.7355904Z =========================== short test summary info ============================ 2025-12-04T12:50:37.7356175Z FAILED [9.6147s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T12:50:37.7356221Z Traceback (most recent call last): 2025-12-04T12:50:37.7356385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T12:50:37.7356427Z getattr(self, test_name)() 2025-12-04T12:50:37.7356587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T12:50:37.7356622Z fn() 2025-12-04T12:50:37.7356775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7356816Z method(*args, **kwargs) 2025-12-04T12:50:37.7356968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:50:37.7357007Z method(*args, **kwargs) 2025-12-04T12:50:37.7357157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:50:37.7357194Z with policy(): 2025-12-04T12:50:37.7357346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:50:37.7357387Z raise RuntimeError(msg) 2025-12-04T12:50:37.7357739Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856. 2025-12-04T12:50:37.7357744Z 2025-12-04T12:50:37.7357820Z To execute this test, run the following from the base repo dir: 2025-12-04T12:50:37.7358077Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7358080Z 2025-12-04T12:50:37.7358166Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:50:37.7358229Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:50:37.7358291Z ======================= 1 failed, 7 deselected in 9.62s ======================== 2025-12-04T12:50:37.7358328Z Got exit code 1 2025-12-04T12:50:37.7358535Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T12:50:37.7358691Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:50:37.7358920Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-19f1225aafb7bb19.xml 2025-12-04T12:50:37.7358979Z ============================= test session starts ============================== 2025-12-04T12:50:37.7359092Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T12:50:37.7359133Z cachedir: .pytest_cache 2025-12-04T12:50:37.7359291Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:50:37.7359337Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T12:50:37.7359378Z configfile: pytest.ini 2025-12-04T12:50:37.7359540Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T12:50:37.7359616Z collecting ... collected 8 items / 8 deselected / 0 selected 2025-12-04T12:50:37.7359680Z stepcurrent: skipping 8 already run items. 2025-12-04T12:50:37.7359781Z Running 0 items in this shard 2025-12-04T12:50:37.7359783Z 2025-12-04T12:50:37.7360053Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-19f1225aafb7bb19.xml - 2025-12-04T12:50:37.7360113Z ============================ 8 deselected in 0.00s ============================= 2025-12-04T12:50:37.7361886Z The following tests failed consistently: ['test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda'] 2025-12-04T12:50:37.7361891Z 2025-12-04T12:50:37.7362112Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_e5c237ac1f49bda1_.log) 2025-12-04T12:50:37.7362116Z 2025-12-04T12:50:37.7362256Z Finished distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 12:50:37.623288][2237780.098315478], took 5.08min 2025-12-04T12:50:37.7362525Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:50:37.7362612Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:50:37.7362707Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T12:50:37.7362756Z Uploading artifacts took 0.00 seconds 2025-12-04T12:50:37.7362829Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed! 2025-12-04T12:50:37.7362972Z Running distributed/_composable/fsdp/test_fully_shard_training 1/1 ... [2025-12-04 12:50:37.626427][2237780.101457317] 2025-12-04T12:50:37.7363020Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:50:37.7363391Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_training.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:50:37.626588] 2025-12-04T12:58:21.9371844Z 2025-12-04T12:58:21.9372762Z distributed/_composable/fsdp/test_fully_shard_training 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_training_1.1_669eaa8ddb416cb4_.log 2025-12-04T12:58:21.9424560Z Running 25 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardForwardInputs::test_root_move_forward_input_to_device, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardRegisteredParams::test_param_registration_after_backward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardRegisteredParams::test_param_registration_after_forward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardCastAfterInit::test_to_float64_after_init, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_explicit_prefetching, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_multi_forward_module, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_non_root_forward_backward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_post_optim_event, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_cpu_offload_eager, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_unshard_async_op, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_single_group_shard_dim0, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_single_group_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCompose::test_train_parity_with_activation_checkpointing, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShardPlacementFnMultiProcess::test_train_parity_shard_placement_fn_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShardPlacementFnMultiThread::test_shard_placement_fn_contiguous_params_grads, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardSharedParams::test_train_parity_with_shared_params, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardGradientAccumulation::test_1f1b_microbatching, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardGradientAccumulation::test_gradient_accumulation, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardNDTraining::test_2d_mlp_with_nd_mesh, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardHSDP3DTraining::test_3d_mlp_with_nd_mesh, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardHSDPTraining::test_train_parity_hsdp, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardCustomForwardMethod::test_register_fsdp_forward_method, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShareCommContext::test_share_comm_context, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardWorldSize1::test_train_parity_single_worldsize1 2025-12-04T12:58:21.9615676Z 2025-12-04T12:58:21.9616098Z Finished distributed/_composable/fsdp/test_fully_shard_training 1/1 ... [2025-12-04 12:58:21.961449][2238244.436468048], took 7.74min 2025-12-04T12:58:21.9635076Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:58:21.9648507Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:58:21.9653199Z Running distributed/_shard/sharded_tensor/ops/test_binary_cmp 1/1 ... [2025-12-04 12:58:21.965082][2238244.440109677] 2025-12-04T12:58:21.9653460Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:58:21.9654703Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_binary_cmp.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:21.965312] 2025-12-04T12:58:43.0133940Z 2025-12-04T12:58:43.0135118Z distributed/_shard/sharded_tensor/ops/test_binary_cmp 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.ops.test_binary_cmp_1.1_004b72dafdb32d25_.log 2025-12-04T12:58:43.0136263Z Running 4 items in this shard: test/distributed/_shard/sharded_tensor/ops/test_binary_cmp.py::TestShardedTensorBinaryOps::test_torch_allclose, test/distributed/_shard/sharded_tensor/ops/test_binary_cmp.py::TestShardedTensorBinaryOps::test_torch_allclose_tensor_specs, test/distributed/_shard/sharded_tensor/ops/test_binary_cmp.py::TestShardedTensorBinaryOps::test_torch_equal, test/distributed/_shard/sharded_tensor/ops/test_binary_cmp.py::TestShardedTensorBinaryOps::test_torch_equal_tensor_specs 2025-12-04T12:58:43.0137023Z 2025-12-04T12:58:43.0137182Z Finished distributed/_shard/sharded_tensor/ops/test_binary_cmp 1/1 ... [2025-12-04 12:58:43.013214][2238265.488239395], took 0.35min 2025-12-04T12:58:43.0149626Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:58:43.0164550Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:58:43.0168137Z Running distributed/test_nccl 1/1 ... [2025-12-04 12:58:43.016664][2238265.491693809] 2025-12-04T12:58:43.0168577Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:58:43.0170154Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_nccl.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:43.016865] 2025-12-04T12:58:54.3990418Z 2025-12-04T12:58:54.3991007Z distributed/test_nccl 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_nccl_1.1_7ecc0923d6033898_.log 2025-12-04T12:58:54.3993202Z Running 15 items in this shard: test/distributed/test_nccl.py::NCCLSymmetricMemoryTest::test_nccl_symmem_alloc, test/distributed/test_nccl.py::TestNCCLCUDA::test_all_gather_cuda_bfloat16, test/distributed/test_nccl.py::TestNCCLCUDA::test_all_gather_cuda_float32, test/distributed/test_nccl.py::TestNCCLCUDA::test_all_reduce_cuda_bfloat16, test/distributed/test_nccl.py::TestNCCLCUDA::test_all_reduce_cuda_float32, test/distributed/test_nccl.py::TestNCCLCUDA::test_broadcast_cuda_bfloat16, test/distributed/test_nccl.py::TestNCCLCUDA::test_broadcast_cuda_float32, test/distributed/test_nccl.py::TestNCCLCUDA::test_broadcast_cuda_float8_e4m3fnuz, test/distributed/test_nccl.py::TestNCCLCUDA::test_broadcast_cuda_float8_e5m2fnuz, test/distributed/test_nccl.py::TestNCCLCUDA::test_collective_errors_cuda, test/distributed/test_nccl.py::TestNCCLCUDA::test_reduce_cuda_bfloat16, test/distributed/test_nccl.py::TestNCCLCUDA::test_reduce_cuda_float32, test/distributed/test_nccl.py::TestNCCLCUDA::test_reduce_scatter_cuda_bfloat16, test/distributed/test_nccl.py::TestNCCLCUDA::test_reduce_scatter_cuda_float32, test/distributed/test_nccl.py::TestNCCLCUDA::test_unique_id_cuda 2025-12-04T12:58:54.3994900Z 2025-12-04T12:58:54.3995014Z Finished distributed/test_nccl 1/1 ... [2025-12-04 12:58:54.398803][2238276.873828052], took 0.19min 2025-12-04T12:58:54.4012774Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:58:54.4027892Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:58:54.4033230Z Running distributed/fsdp/test_fsdp_meta 1/1 ... [2025-12-04 12:58:54.403045][2238276.878074118] 2025-12-04T12:58:54.4033779Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:58:54.4034722Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_meta.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:54.403251] 2025-12-04T12:59:49.4539127Z 2025-12-04T12:59:49.4543867Z distributed/fsdp/test_fsdp_meta 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_meta_1.1_d3de5a971a752924_.log 2025-12-04T12:59:49.4549818Z Running 15 items in this shard: test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_bad_arg_meta, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_bad_arg_torchdistx, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_meta_device_with_mixed_precision, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_meta_device_default_init_auto_wrap_False, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_meta_device_default_init_auto_wrap_True, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_meta_device_reset_params_auto_wrap_False, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_meta_device_reset_params_auto_wrap_True, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_torchdistX_default_init_auto_wrap_False, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_torchdistX_default_init_auto_wrap_True, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_torchdistX_init_fn_auto_wrap_False, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_nested_model_with_torchdistX_init_fn_auto_wrap_True, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_simple_model_with_meta_device_default_init, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_simple_model_with_meta_device_reset_params, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_simple_model_with_torchdistX_default_init, test/distributed/fsdp/test_fsdp_meta.py::TestFSDPWithMetaDevice::test_simple_model_with_torchdistX_init_fn 2025-12-04T12:59:49.4554326Z 2025-12-04T12:59:49.4554547Z Finished distributed/fsdp/test_fsdp_meta 1/1 ... [2025-12-04 12:59:49.453729][2238331.928751492], took 0.92min 2025-12-04T12:59:49.4562008Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T12:59:49.4579603Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:59:49.4582948Z Running distributed/test_data_parallel 1/1 ... [2025-12-04 12:59:49.458180][2238331.933209043] 2025-12-04T12:59:49.4583210Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:59:49.4584786Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_data_parallel.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:49.458378] 2025-12-04T13:00:12.4597185Z 2025-12-04T13:00:12.4599134Z distributed/test_data_parallel 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_data_parallel_1.1_f8cca603dab36073_.log 2025-12-04T13:00:12.4614350Z Running 46 items in this shard: test/distributed/test_data_parallel.py::TestDataParallel::test_autocast, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_buffers_requiring_grad, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_complex, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_device_args, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_function_deletion, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_lazy_linear, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_model_device, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_model_no_refcycles, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_module_zero_inputs, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_multiple_input, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_nested_input, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_nested_output, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_no_grad, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_rnn, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_small_back, test/distributed/test_data_parallel.py::TestDataParallel::test_data_parallel_sparse, test/distributed/test_data_parallel.py::TestDataParallel::test_gather_cpu, test/distributed/test_data_parallel.py::TestDataParallel::test_gather_different_len_dicts, test/distributed/test_data_parallel.py::TestDataParallel::test_gather_gpu, test/distributed/test_data_parallel.py::TestDataParallel::test_parallel_apply, test/distributed/test_data_parallel.py::TestDataParallel::test_parallel_apply_autocast, test/distributed/test_data_parallel.py::TestDataParallel::test_parallel_apply_passes_exception, test/distributed/test_data_parallel.py::TestDataParallel::test_parameter_list_dict_replica, test/distributed/test_data_parallel.py::TestDataParallel::test_replicate, test/distributed/test_data_parallel.py::TestDataParallel::test_replicate_buffers, test/distributed/test_data_parallel.py::TestDataParallel::test_save_replica_module, test/distributed/test_data_parallel.py::TestDataParallel::test_scatter_cpu, test/distributed/test_data_parallel.py::TestDataParallel::test_scatter_gpu, test/distributed/test_data_parallel.py::TestDataParallel::test_strided_grad_layout, test/distributed/test_data_parallel.py::TestDataParallel::test_zero_grad, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_cuda_float16, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_cuda_float32, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_cuda_float64, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_cuda_float16, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_cuda_float32, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_cuda_float64, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_dict_cuda_float16, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_dict_cuda_float32, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_dict_cuda_float64, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_list_cuda_float16, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_list_cuda_float32, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_list_cuda_float64, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_tuple_cuda_float16, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_tuple_cuda_float32, test/distributed/test_data_parallel.py::TestDataParallelDeviceTypeCUDA::test_data_parallel_module_kwargs_only_empty_tuple_cuda_float64 2025-12-04T13:00:12.4623907Z 2025-12-04T13:00:12.4624084Z Finished distributed/test_data_parallel 1/1 ... [2025-12-04 13:00:12.459382][2238354.934406424], took 0.38min 2025-12-04T13:00:12.4624656Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:00:12.4635245Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:00:12.4639237Z Running distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 13:00:12.463706][2238354.938736128] 2025-12-04T13:00:12.4639778Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:00:12.4640796Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/checkpoint/test_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:12.463907] 2025-12-04T13:02:46.8115710Z 2025-12-04T13:02:46.8116943Z distributed/checkpoint/test_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_state_dict_1.1_fdc082e611ce971d_.log 2025-12-04T13:02:46.8125734Z Running 25 items in this shard: test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_fsdp1, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_compiled_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_cpu_offload_full_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_deprecate_api, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_extra_state, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_flattened_osd, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp2, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_root_not_initialized, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_device_load_model_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_param_groups, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_non_persistent_buffers, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_optim_state_dict_param_matching, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_set_cpu_model_state_dict_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model_broadcasting_and_memory, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_shared_weight, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_single_gpu, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_state_dict_with_hook_on_keys, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_strict, test/distributed/checkpoint/test_state_dict.py::TestNoComm::test_no_dist 2025-12-04T13:02:46.8131764Z 2025-12-04T13:02:46.8131974Z Finished distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 13:02:46.812710][2238509.287736319], took 2.57min 2025-12-04T13:02:46.8146168Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:02:46.8159065Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:02:46.8161535Z Running distributed/fsdp/test_fsdp_core 3/3 ... [2025-12-04 13:02:46.816037][2238509.291066705] 2025-12-04T13:02:46.8161781Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:02:46.8163135Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:02:46.816199] 2025-12-04T13:24:33.6235284Z 2025-12-04T13:24:33.6236140Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 3/3 (test/test-reports/distributed.fsdp.test_fsdp_core_3.3_fbe45a0587bc369b_.log) 2025-12-04T13:24:33.6237154Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9ded38eb0738e27f.xml 2025-12-04T13:24:33.6237807Z ============================= test session starts ============================== 2025-12-04T13:24:33.6238305Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6238742Z cachedir: .pytest_cache 2025-12-04T13:24:33.6239281Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6240115Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6240407Z configfile: pytest.ini 2025-12-04T13:24:33.6240914Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6241476Z collecting ... collected 60 items 2025-12-04T13:24:33.6242044Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T13:24:33.6249410Z Running 21 items in this shard: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.6255411Z 2025-12-04T13:24:33.6255953Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 13:02:48.493000 435186 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 435255 2025-12-04T13:24:33.6256665Z I1204 13:02:48.494000 435186 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 435256 2025-12-04T13:24:33.6257137Z I1204 13:02:48.494000 435186 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 435257 2025-12-04T13:24:33.6257604Z I1204 13:02:48.495000 435186 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 435258 2025-12-04T13:24:33.6258371Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6258991Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6259545Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6260133Z {} 2025-12-04T13:24:33.6260414Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6260735Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6261600Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6262408Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6263025Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6263723Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6264137Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6264557Z {} 2025-12-04T13:24:33.6264787Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6265090Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6265791Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6266456Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6266934Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6267412Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6267875Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6268385Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6268827Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6269240Z {} 2025-12-04T13:24:33.6269473Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6269749Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6270384Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6271045Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6271459Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6271922Z {} 2025-12-04T13:24:33.6272138Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6272397Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6273035Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6273656Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6273913Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6274274Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6274773Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6275317Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6275807Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6276262Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6276711Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6277182Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6277665Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6278232Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6278707Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6279165Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6279635Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6280174Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6280915Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6281571Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6281929Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6282553Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6283191Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6283595Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6284075Z [rank2]:E1204 13:02:54.673000 435257 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6284333Z dist init r=2, world=4 2025-12-04T13:24:33.6284542Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6284885Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6285425Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6285957Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6286450Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6286911Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6287358Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6287881Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6288361Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6288927Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6289394Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6289899Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6290386Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6290869Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6291576Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6292267Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6292718Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6293343Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6293882Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6294266Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6294733Z [rank3]:E1204 13:02:54.679000 435258 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6295016Z dist init r=3, world=4 2025-12-04T13:24:33.6295290Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6295717Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6296261Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6296762Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6297301Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6297826Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6298350Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6298853Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6299344Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6299886Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6300369Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6300827Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6301290Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6301769Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6302528Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6303183Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6303538Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6304201Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6304736Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6305144Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6305565Z [rank0]:E1204 13:02:54.754000 435255 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6305810Z dist init r=0, world=4 2025-12-04T13:24:33.6306014Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6306357Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6306858Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6307347Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6307875Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6308329Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6308780Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6309255Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6309791Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6310264Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6310737Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6311194Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6311654Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6312132Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6312827Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6313491Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6313844Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6314498Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6315033Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6315404Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6315831Z [rank1]:E1204 13:02:54.757000 435256 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6316073Z dist init r=1, world=4 2025-12-04T13:24:33.6316492Z [rank0]:[W1204 13:02:55.659278578 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6316929Z FAILED [7.8135s] [ 4%] 2025-12-04T13:24:33.6317011Z 2025-12-04T13:24:33.6317073Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6317289Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _ 2025-12-04T13:24:33.6317493Z Traceback (most recent call last): 2025-12-04T13:24:33.6317742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6317991Z self._join_processes(fn) 2025-12-04T13:24:33.6318240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6318506Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6318778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6319042Z raise RuntimeError(error) 2025-12-04T13:24:33.6319198Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6319385Z Traceback (most recent call last): 2025-12-04T13:24:33.6319626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6319944Z getattr(self, test_name)() 2025-12-04T13:24:33.6320177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6320433Z fn() 2025-12-04T13:24:33.6320640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6320874Z method(*args, **kwargs) 2025-12-04T13:24:33.6321099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6321333Z method(*args, **kwargs) 2025-12-04T13:24:33.6321557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6321786Z with policy(): 2025-12-04T13:24:33.6321999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6322239Z raise RuntimeError(msg) 2025-12-04T13:24:33.6322681Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6323092Z 2025-12-04T13:24:33.6323167Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6323589Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6323886Z 2025-12-04T13:24:33.6323975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6324104Z 2025-12-04T13:24:33.6324105Z 2025-12-04T13:24:33.6324184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6324390Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6324757Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9ded38eb0738e27f.xml - 2025-12-04T13:24:33.6325090Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6325459Z FAILED [7.8135s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6325830Z Traceback (most recent call last): 2025-12-04T13:24:33.6326093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6326339Z getattr(self, test_name)() 2025-12-04T13:24:33.6326574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6326809Z fn() 2025-12-04T13:24:33.6327012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6327243Z method(*args, **kwargs) 2025-12-04T13:24:33.6327463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6327694Z method(*args, **kwargs) 2025-12-04T13:24:33.6327915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6328147Z with policy(): 2025-12-04T13:24:33.6328359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6328594Z raise RuntimeError(msg) 2025-12-04T13:24:33.6329038Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6329447Z 2025-12-04T13:24:33.6329524Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6329932Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6330219Z 2025-12-04T13:24:33.6330310Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6330501Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6330660Z ============================== 1 failed in 7.98s =============================== 2025-12-04T13:24:33.6330794Z Got exit code 1 2025-12-04T13:24:33.6330891Z Retrying single test... 2025-12-04T13:24:33.6331147Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-557e3c705cc8ff82.xml 2025-12-04T13:24:33.6331432Z ============================= test session starts ============================== 2025-12-04T13:24:33.6331646Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6331836Z cachedir: .pytest_cache 2025-12-04T13:24:33.6332063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6332343Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6332464Z configfile: pytest.ini 2025-12-04T13:24:33.6332693Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6332968Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6333324Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6333649Z Running 1 items in this shard 2025-12-04T13:24:33.6333722Z 2025-12-04T13:24:33.6334051Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 13:02:58.868000 435572 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 435641 2025-12-04T13:24:33.6334591Z I1204 13:02:58.868000 435572 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 435642 2025-12-04T13:24:33.6334966Z I1204 13:02:58.869000 435572 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 435643 2025-12-04T13:24:33.6335312Z I1204 13:02:58.870000 435572 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 435644 2025-12-04T13:24:33.6335867Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6336313Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6336756Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6337198Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6337572Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6337967Z {} 2025-12-04T13:24:33.6338173Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6338389Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6339004Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6339602Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6340044Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6340435Z {} 2025-12-04T13:24:33.6340638Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6340853Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6341496Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6342091Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6342549Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6342996Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6343369Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6343762Z {} 2025-12-04T13:24:33.6343966Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6344198Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6344818Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6345406Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6345859Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6346300Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6346674Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6347062Z {} 2025-12-04T13:24:33.6347264Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6347479Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6348081Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6348666Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6348912Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6349264Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6349805Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6350293Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6350780Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6351265Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6351716Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6352191Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6352663Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6353133Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6353608Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6354099Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6354560Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6355039Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6355739Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6356395Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6356751Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6357369Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6357902Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6358283Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6358706Z [rank3]:E1204 13:03:04.982000 435644 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6358950Z dist init r=3, world=4 2025-12-04T13:24:33.6359155Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6359498Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6360030Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6360551Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6361037Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6361493Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6361939Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6362410Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6362884Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6363385Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6363856Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6364314Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6364772Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6365246Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6365941Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6366593Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6366948Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6367567Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6368098Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6368468Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6368889Z [rank0]:E1204 13:03:05.021000 435641 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6369134Z dist init r=0, world=4 2025-12-04T13:24:33.6369339Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6369680Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6370250Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6370737Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6371222Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6371675Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6372125Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6372609Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6373096Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6373565Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6374033Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6374492Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6374956Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6375429Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6376120Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6376770Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6377127Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6377745Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6378275Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6378647Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6379066Z [rank2]:E1204 13:03:05.048000 435643 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6379311Z dist init r=2, world=4 2025-12-04T13:24:33.6379541Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6379974Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6380475Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6380963Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6381452Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6381925Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6382386Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6382858Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6383330Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6383801Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6384392Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6384855Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6385315Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6385788Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6386485Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6387138Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6387496Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6388115Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6388648Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6389051Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6389472Z [rank1]:E1204 13:03:05.063000 435642 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6389771Z dist init r=1, world=4 2025-12-04T13:24:33.6390179Z [rank0]:[W1204 13:03:05.757607700 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6390594Z FAILED [7.8142s] [100%] 2025-12-04T13:24:33.6390658Z 2025-12-04T13:24:33.6390719Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6390935Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _ 2025-12-04T13:24:33.6391153Z Traceback (most recent call last): 2025-12-04T13:24:33.6391402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6391674Z self._join_processes(fn) 2025-12-04T13:24:33.6391923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6392188Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6392459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6392723Z raise RuntimeError(error) 2025-12-04T13:24:33.6392877Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6393041Z Traceback (most recent call last): 2025-12-04T13:24:33.6393286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6393537Z getattr(self, test_name)() 2025-12-04T13:24:33.6393772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6394008Z fn() 2025-12-04T13:24:33.6394211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6394446Z method(*args, **kwargs) 2025-12-04T13:24:33.6394670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6394903Z method(*args, **kwargs) 2025-12-04T13:24:33.6395123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6395352Z with policy(): 2025-12-04T13:24:33.6395565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6395801Z raise RuntimeError(msg) 2025-12-04T13:24:33.6396248Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6396657Z 2025-12-04T13:24:33.6396733Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6397097Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6397388Z 2025-12-04T13:24:33.6397476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6397603Z 2025-12-04T13:24:33.6397605Z 2025-12-04T13:24:33.6397720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6397925Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6398288Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-557e3c705cc8ff82.xml - 2025-12-04T13:24:33.6398622Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6398990Z FAILED [7.8142s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6399338Z Traceback (most recent call last): 2025-12-04T13:24:33.6399586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6399876Z getattr(self, test_name)() 2025-12-04T13:24:33.6400114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6400378Z fn() 2025-12-04T13:24:33.6400582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6400814Z method(*args, **kwargs) 2025-12-04T13:24:33.6401035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6401266Z method(*args, **kwargs) 2025-12-04T13:24:33.6401485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6401712Z with policy(): 2025-12-04T13:24:33.6401923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6402156Z raise RuntimeError(msg) 2025-12-04T13:24:33.6402602Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6403015Z 2025-12-04T13:24:33.6403093Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6403459Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6403747Z 2025-12-04T13:24:33.6403838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6404027Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6404195Z ======================= 1 failed, 20 deselected in 7.98s ======================= 2025-12-04T13:24:33.6404337Z Got exit code 1 2025-12-04T13:24:33.6404436Z Retrying single test... 2025-12-04T13:24:33.6404693Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-926021651d25b1a7.xml 2025-12-04T13:24:33.6404979Z ============================= test session starts ============================== 2025-12-04T13:24:33.6405193Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6405385Z cachedir: .pytest_cache 2025-12-04T13:24:33.6405611Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6405853Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6405973Z configfile: pytest.ini 2025-12-04T13:24:33.6406202Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6406516Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6406874Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6407196Z Running 1 items in this shard 2025-12-04T13:24:33.6407269Z 2025-12-04T13:24:33.6407599Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 13:03:09.258000 435958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 436027 2025-12-04T13:24:33.6408120Z I1204 13:03:09.259000 435958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 436028 2025-12-04T13:24:33.6408467Z I1204 13:03:09.259000 435958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 436029 2025-12-04T13:24:33.6408813Z I1204 13:03:09.260000 435958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 436030 2025-12-04T13:24:33.6409392Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6409895Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6410274Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6410672Z {} 2025-12-04T13:24:33.6410879Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6411099Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6411717Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6412313Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6412771Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6413213Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6413587Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6413984Z {} 2025-12-04T13:24:33.6414188Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6414405Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6415009Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6415598Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6416091Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6416534Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6416906Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6417297Z {} 2025-12-04T13:24:33.6417500Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6417714Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6418321Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6418942Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6419399Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6419871Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6420248Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6420644Z {} 2025-12-04T13:24:33.6420852Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6421076Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6421686Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6422285Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6422540Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6422896Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6423403Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6423902Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6448553Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6449030Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6449480Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6450081Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6450553Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6451016Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6451480Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6451934Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6452406Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6452893Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6453586Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6454238Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6454593Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6455214Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6455747Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6456115Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6456531Z [rank1]:E1204 13:03:15.327000 436028 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6456777Z dist init r=1, world=4 2025-12-04T13:24:33.6456988Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6457332Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6457822Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6458303Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6458783Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6459258Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6459749Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6460214Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6460678Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6461139Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6461603Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6462083Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6462538Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6463004Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6463692Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6464337Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6464685Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6465297Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6465822Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6466188Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6466605Z [rank3]:E1204 13:03:15.328000 436030 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6466846Z dist init r=3, world=4 2025-12-04T13:24:33.6467049Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6467385Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6467871Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6468385Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6468865Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6469314Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6469793Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6470259Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6470725Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6471238Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6471701Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6472150Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6472602Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6473067Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6473751Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6474394Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6474740Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6475352Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6475877Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6476239Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6476652Z [rank2]:E1204 13:03:15.332000 436029 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6476892Z dist init r=2, world=4 2025-12-04T13:24:33.6477094Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6477431Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6477948Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6478429Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6478911Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6479362Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6479859Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6480337Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6480814Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6481275Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6481739Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6482190Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6482646Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6483112Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6483797Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6484439Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6484788Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6485401Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6485924Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6486285Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6486697Z [rank0]:E1204 13:03:15.380000 436027 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6486937Z dist init r=0, world=4 2025-12-04T13:24:33.6487498Z [rank0]:[W1204 13:03:15.161628683 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6487908Z FAILED [7.8133s] [100%] 2025-12-04T13:24:33.6487972Z 2025-12-04T13:24:33.6488033Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6488247Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _ 2025-12-04T13:24:33.6488446Z Traceback (most recent call last): 2025-12-04T13:24:33.6488690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6488933Z self._join_processes(fn) 2025-12-04T13:24:33.6489182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6489456Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6489778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6490037Z raise RuntimeError(error) 2025-12-04T13:24:33.6490187Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.6490348Z Traceback (most recent call last): 2025-12-04T13:24:33.6490585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6490825Z getattr(self, test_name)() 2025-12-04T13:24:33.6491057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6491288Z fn() 2025-12-04T13:24:33.6491491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6491721Z method(*args, **kwargs) 2025-12-04T13:24:33.6491941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6492168Z method(*args, **kwargs) 2025-12-04T13:24:33.6492385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6492610Z with policy(): 2025-12-04T13:24:33.6492818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6493047Z raise RuntimeError(msg) 2025-12-04T13:24:33.6493484Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6493890Z 2025-12-04T13:24:33.6493966Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6494330Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6494616Z 2025-12-04T13:24:33.6494704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6494829Z 2025-12-04T13:24:33.6494889Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6495029Z Traceback (most recent call last): 2025-12-04T13:24:33.6495269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6495509Z getattr(self, test_name)() 2025-12-04T13:24:33.6495772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6496004Z fn() 2025-12-04T13:24:33.6496203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6496433Z method(*args, **kwargs) 2025-12-04T13:24:33.6496650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6496877Z method(*args, **kwargs) 2025-12-04T13:24:33.6497091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6497314Z with policy(): 2025-12-04T13:24:33.6497521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6497749Z raise RuntimeError(msg) 2025-12-04T13:24:33.6498189Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6498627Z 2025-12-04T13:24:33.6498700Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6499057Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6499343Z 2025-12-04T13:24:33.6499431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6499555Z 2025-12-04T13:24:33.6499614Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6499795Z Traceback (most recent call last): 2025-12-04T13:24:33.6500034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6500276Z getattr(self, test_name)() 2025-12-04T13:24:33.6500508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6500736Z fn() 2025-12-04T13:24:33.6500933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6501159Z method(*args, **kwargs) 2025-12-04T13:24:33.6501375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6501601Z method(*args, **kwargs) 2025-12-04T13:24:33.6501815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6502037Z with policy(): 2025-12-04T13:24:33.6502246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6502474Z raise RuntimeError(msg) 2025-12-04T13:24:33.6502912Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6503315Z 2025-12-04T13:24:33.6503390Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6503747Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6504032Z 2025-12-04T13:24:33.6504121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6504243Z 2025-12-04T13:24:33.6504245Z 2025-12-04T13:24:33.6504358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6504559Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6504916Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-926021651d25b1a7.xml - 2025-12-04T13:24:33.6505251Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6505619Z FAILED [7.8133s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.6505964Z Traceback (most recent call last): 2025-12-04T13:24:33.6506207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6506449Z getattr(self, test_name)() 2025-12-04T13:24:33.6506703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6506948Z fn() 2025-12-04T13:24:33.6507147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6507376Z method(*args, **kwargs) 2025-12-04T13:24:33.6507593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6507819Z method(*args, **kwargs) 2025-12-04T13:24:33.6508035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6508257Z with policy(): 2025-12-04T13:24:33.6508466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6508693Z raise RuntimeError(msg) 2025-12-04T13:24:33.6509131Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6509535Z 2025-12-04T13:24:33.6509608Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6510014Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6510304Z 2025-12-04T13:24:33.6510393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6510523Z 2025-12-04T13:24:33.6510581Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6510723Z Traceback (most recent call last): 2025-12-04T13:24:33.6510971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6511220Z getattr(self, test_name)() 2025-12-04T13:24:33.6511455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6511690Z fn() 2025-12-04T13:24:33.6511894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6512126Z method(*args, **kwargs) 2025-12-04T13:24:33.6512343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6512574Z method(*args, **kwargs) 2025-12-04T13:24:33.6512795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6513025Z with policy(): 2025-12-04T13:24:33.6513275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6513514Z raise RuntimeError(msg) 2025-12-04T13:24:33.6513956Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6514362Z 2025-12-04T13:24:33.6514436Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6514791Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6515074Z 2025-12-04T13:24:33.6515162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6515297Z 2025-12-04T13:24:33.6515358Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6515510Z Traceback (most recent call last): 2025-12-04T13:24:33.6515750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6515996Z getattr(self, test_name)() 2025-12-04T13:24:33.6516224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6516455Z fn() 2025-12-04T13:24:33.6516654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6516881Z method(*args, **kwargs) 2025-12-04T13:24:33.6517098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6517331Z method(*args, **kwargs) 2025-12-04T13:24:33.6517551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6517776Z with policy(): 2025-12-04T13:24:33.6517983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6518212Z raise RuntimeError(msg) 2025-12-04T13:24:33.6518647Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6519047Z 2025-12-04T13:24:33.6519122Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6519481Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6519812Z 2025-12-04T13:24:33.6519899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6520087Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6520250Z ======================= 1 failed, 20 deselected in 7.97s ======================= 2025-12-04T13:24:33.6520387Z Got exit code 1 2025-12-04T13:24:33.6520642Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda 2025-12-04T13:24:33.6521008Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.6521363Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-df3521fde19f857f.xml 2025-12-04T13:24:33.6521646Z ============================= test session starts ============================== 2025-12-04T13:24:33.6521917Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6522107Z cachedir: .pytest_cache 2025-12-04T13:24:33.6522330Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6522568Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6522685Z configfile: pytest.ini 2025-12-04T13:24:33.6522909Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6523178Z collecting ... collected 60 items / 1 deselected / 59 selected 2025-12-04T13:24:33.6523338Z stepcurrent: skipping 1 already run items. 2025-12-04T13:24:33.6523467Z Running 20 items in this shard 2025-12-04T13:24:33.6523538Z 2025-12-04T13:24:33.6523870Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 13:03:19.380000 436344 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 436413 2025-12-04T13:24:33.6524416Z I1204 13:03:19.381000 436344 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 436414 2025-12-04T13:24:33.6524764Z I1204 13:03:19.382000 436344 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 436415 2025-12-04T13:24:33.6525107Z I1204 13:03:19.382000 436344 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 436416 2025-12-04T13:24:33.6525657Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6526098Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6526684Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6527271Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6527721Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6528157Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6528732Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6529318Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6529814Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6530249Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6530850Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6531433Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6531881Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6532314Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6532880Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6533478Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6533719Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6534081Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6534572Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6535053Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6535531Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6535985Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6536430Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6536894Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6537361Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6537824Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6538291Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6538747Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6539205Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6539670Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6540444Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6541092Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6541445Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6542057Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6542582Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6542967Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6543396Z [rank2]:E1204 13:03:25.451000 436415 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6543639Z dist init r=2, world=4 2025-12-04T13:24:33.6543844Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6544180Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6544668Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6545148Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6545625Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6546073Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6546511Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6546972Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6547439Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6547905Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6548369Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6548821Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6549275Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6549805Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6550490Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6551131Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6551478Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6552090Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6552642Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6553007Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6553422Z [rank0]:E1204 13:03:25.530000 436413 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6553664Z dist init r=0, world=4 2025-12-04T13:24:33.6553867Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6554203Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6554692Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6555170Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6555650Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6556097Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6556539Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6557004Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6557466Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6557930Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6558399Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6558854Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6559329Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6559834Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6560516Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6561160Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6561524Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6562146Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6562668Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6563032Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6563447Z [rank1]:E1204 13:03:25.533000 436414 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6563691Z dist init r=1, world=4 2025-12-04T13:24:33.6563892Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6564230Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6564713Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6565189Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6565665Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6566114Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6566553Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6567015Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6567478Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6567939Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6568438Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6568889Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6569344Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6569855Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6570540Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6571211Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6571559Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6572168Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6572691Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6573057Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6573470Z [rank3]:E1204 13:03:25.542000 436416 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6573710Z dist init r=3, world=4 2025-12-04T13:24:33.6574109Z [rank0]:[W1204 13:03:25.294615980 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6574516Z FAILED [7.8158s] [ 5%] 2025-12-04T13:24:33.6574579Z 2025-12-04T13:24:33.6574639Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6574855Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _ 2025-12-04T13:24:33.6575092Z Traceback (most recent call last): 2025-12-04T13:24:33.6575337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6575585Z self._join_processes(fn) 2025-12-04T13:24:33.6575829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6576093Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6576362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6576625Z raise RuntimeError(error) 2025-12-04T13:24:33.6576780Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6576946Z Traceback (most recent call last): 2025-12-04T13:24:33.6577228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6577476Z getattr(self, test_name)() 2025-12-04T13:24:33.6577715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6577947Z fn() 2025-12-04T13:24:33.6578150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6578385Z method(*args, **kwargs) 2025-12-04T13:24:33.6578613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6578850Z method(*args, **kwargs) 2025-12-04T13:24:33.6579074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6579305Z with policy(): 2025-12-04T13:24:33.6579523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6579814Z raise RuntimeError(msg) 2025-12-04T13:24:33.6580273Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6580677Z 2025-12-04T13:24:33.6580752Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6581112Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6581397Z 2025-12-04T13:24:33.6581484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6581610Z 2025-12-04T13:24:33.6581611Z 2025-12-04T13:24:33.6581692Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6581898Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6582259Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-df3521fde19f857f.xml - 2025-12-04T13:24:33.6582593Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6582963Z FAILED [7.8158s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6583313Z Traceback (most recent call last): 2025-12-04T13:24:33.6583565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6583816Z getattr(self, test_name)() 2025-12-04T13:24:33.6584053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6584288Z fn() 2025-12-04T13:24:33.6584490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6584727Z method(*args, **kwargs) 2025-12-04T13:24:33.6584948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6585183Z method(*args, **kwargs) 2025-12-04T13:24:33.6585402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6585629Z with policy(): 2025-12-04T13:24:33.6585836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6586069Z raise RuntimeError(msg) 2025-12-04T13:24:33.6586550Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6586955Z 2025-12-04T13:24:33.6587033Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6587398Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6587684Z 2025-12-04T13:24:33.6587777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6587966Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6588135Z ======================= 1 failed, 1 deselected in 7.98s ======================== 2025-12-04T13:24:33.6588293Z Got exit code 1 2025-12-04T13:24:33.6588394Z Retrying single test... 2025-12-04T13:24:33.6588666Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d7722e1051fab5f0.xml 2025-12-04T13:24:33.6588949Z ============================= test session starts ============================== 2025-12-04T13:24:33.6589157Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6589347Z cachedir: .pytest_cache 2025-12-04T13:24:33.6589574Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6589868Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6589991Z configfile: pytest.ini 2025-12-04T13:24:33.6590220Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6590504Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6590863Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6591186Z Running 1 items in this shard 2025-12-04T13:24:33.6591258Z 2025-12-04T13:24:33.6591583Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 13:03:29.796000 436730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 436799 2025-12-04T13:24:33.6592095Z I1204 13:03:29.797000 436730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 436800 2025-12-04T13:24:33.6592444Z I1204 13:03:29.797000 436730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 436801 2025-12-04T13:24:33.6592792Z I1204 13:03:29.798000 436730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 436802 2025-12-04T13:24:33.6593346Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6593792Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6594377Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6594967Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6595452Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6595897Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6596471Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6597057Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6597512Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6597976Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6598551Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6599140Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6599596Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6600073Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6600642Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6601228Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6601470Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6601818Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6602318Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6602805Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6603289Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6603743Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6604190Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6604687Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6605160Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6605627Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6606093Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6606548Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6607009Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6607515Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6608208Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6608857Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6609213Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6609877Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6610409Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6610777Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6611196Z [rank2]:E1204 13:03:36.150000 436801 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6611443Z dist init r=2, world=4 2025-12-04T13:24:33.6611651Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6611994Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6612487Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6612971Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6613457Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6613907Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6614385Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6614854Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6615325Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6615792Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6616263Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6616744Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6617202Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6617672Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6618362Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6619013Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6619367Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6620025Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6620558Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6620926Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6621353Z [rank1]:E1204 13:03:36.186000 436800 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6621599Z dist init r=1, world=4 2025-12-04T13:24:33.6621805Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6622147Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6622637Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6623120Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6623633Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6624088Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6624534Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6625002Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6625471Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6625954Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6626437Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6626892Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6627352Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6627823Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6628523Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6629164Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6629518Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6630170Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6630706Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6631077Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6631496Z [rank0]:E1204 13:03:36.195000 436799 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6631740Z dist init r=0, world=4 2025-12-04T13:24:33.6631947Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6632286Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6633066Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6633553Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6634037Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6634490Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6635037Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6635511Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6636012Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6636480Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6636948Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6637404Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6637867Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6638339Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6639034Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6639678Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6640058Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6640674Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6641200Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6641572Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6641988Z [rank3]:E1204 13:03:36.260000 436802 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6642232Z dist init r=3, world=4 2025-12-04T13:24:33.6642662Z [rank0]:[W1204 13:03:36.974016087 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6643075Z FAILED [8.1149s] [100%] 2025-12-04T13:24:33.6643139Z 2025-12-04T13:24:33.6643199Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6643412Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _ 2025-12-04T13:24:33.6643616Z Traceback (most recent call last): 2025-12-04T13:24:33.6643864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6644108Z self._join_processes(fn) 2025-12-04T13:24:33.6644356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6644621Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6644918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6645204Z raise RuntimeError(error) 2025-12-04T13:24:33.6645363Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6645531Z Traceback (most recent call last): 2025-12-04T13:24:33.6645780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6646030Z getattr(self, test_name)() 2025-12-04T13:24:33.6646272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6646512Z fn() 2025-12-04T13:24:33.6646723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6646962Z method(*args, **kwargs) 2025-12-04T13:24:33.6647194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6647433Z method(*args, **kwargs) 2025-12-04T13:24:33.6647659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6647894Z with policy(): 2025-12-04T13:24:33.6648113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6648352Z raise RuntimeError(msg) 2025-12-04T13:24:33.6648795Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6649199Z 2025-12-04T13:24:33.6649282Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6649648Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6649978Z 2025-12-04T13:24:33.6650067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6650196Z 2025-12-04T13:24:33.6650198Z 2025-12-04T13:24:33.6650276Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6650482Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6650845Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d7722e1051fab5f0.xml - 2025-12-04T13:24:33.6651177Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6651579Z FAILED [8.1149s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6651928Z Traceback (most recent call last): 2025-12-04T13:24:33.6652177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6652426Z getattr(self, test_name)() 2025-12-04T13:24:33.6652712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6652979Z fn() 2025-12-04T13:24:33.6653298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6653561Z method(*args, **kwargs) 2025-12-04T13:24:33.6653819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6654120Z method(*args, **kwargs) 2025-12-04T13:24:33.6654391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6654656Z with policy(): 2025-12-04T13:24:33.6654909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6655181Z raise RuntimeError(msg) 2025-12-04T13:24:33.6655666Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6656106Z 2025-12-04T13:24:33.6656202Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6656602Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6656906Z 2025-12-04T13:24:33.6657023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6657242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6657451Z ======================= 1 failed, 20 deselected in 8.27s ======================= 2025-12-04T13:24:33.6657629Z Got exit code 1 2025-12-04T13:24:33.6657755Z Retrying single test... 2025-12-04T13:24:33.6658568Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-58a89f40f522ee78.xml 2025-12-04T13:24:33.6658888Z ============================= test session starts ============================== 2025-12-04T13:24:33.6659139Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6659378Z cachedir: .pytest_cache 2025-12-04T13:24:33.6659636Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6659965Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6660116Z configfile: pytest.ini 2025-12-04T13:24:33.6660376Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6660700Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6661089Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6661454Z Running 1 items in this shard 2025-12-04T13:24:33.6661549Z 2025-12-04T13:24:33.6661928Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 13:03:40.522000 437116 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 437185 2025-12-04T13:24:33.6662490Z I1204 13:03:40.523000 437116 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 437186 2025-12-04T13:24:33.6662875Z I1204 13:03:40.524000 437116 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 437187 2025-12-04T13:24:33.6663247Z I1204 13:03:40.524000 437116 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 437188 2025-12-04T13:24:33.6663827Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6664316Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6664952Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6665610Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6666100Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6666563Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6667191Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6667819Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6668314Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6668795Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6669395Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6670075Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6670561Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6671053Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6671668Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6672319Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6672608Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6672986Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6673290Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6673466Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6673797Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6673952Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6674271Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6674433Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6674736Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6674905Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6675221Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6675385Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6675679Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6675850Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6676373Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6676524Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6676732Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6678066Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6678200Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6678461Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6678663Z [rank3]:E1204 13:03:46.682000 437188 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6678713Z dist init r=3, world=4 2025-12-04T13:24:33.6678885Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6679058Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6679376Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6679574Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6679927Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6680079Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6680370Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6680552Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6680847Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6681028Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6681332Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6681480Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6681788Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6681960Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6682479Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6682620Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6682828Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6683264Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6683398Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6683638Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6683817Z [rank2]:E1204 13:03:46.688000 437187 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6683881Z dist init r=2, world=4 2025-12-04T13:24:33.6684025Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6684232Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6684558Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6684737Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6685051Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6685183Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6685502Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6685665Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6685965Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6686141Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6686425Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6686606Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6686898Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6687074Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6687584Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6687719Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6687979Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6688367Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6688516Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6688740Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6688928Z [rank1]:E1204 13:03:46.702000 437186 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6689023Z dist init r=1, world=4 2025-12-04T13:24:33.6689189Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6689374Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6689674Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6689885Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6690195Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6690357Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6690647Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6690820Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6691115Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6691287Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6691602Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6691749Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6692051Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6692210Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6692764Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6692911Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6693118Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6693519Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6693646Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6693912Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6694123Z [rank0]:E1204 13:03:46.715000 437185 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6694174Z dist init r=0, world=4 2025-12-04T13:24:33.6694540Z [rank0]:[W1204 13:03:46.472432006 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6694592Z FAILED [7.8141s] [100%] 2025-12-04T13:24:33.6694594Z 2025-12-04T13:24:33.6694682Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6694815Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _ 2025-12-04T13:24:33.6694888Z Traceback (most recent call last): 2025-12-04T13:24:33.6695063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6695135Z self._join_processes(fn) 2025-12-04T13:24:33.6695318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6695410Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6695599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6695671Z raise RuntimeError(error) 2025-12-04T13:24:33.6695774Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6695826Z Traceback (most recent call last): 2025-12-04T13:24:33.6696030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6696085Z getattr(self, test_name)() 2025-12-04T13:24:33.6696280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6696327Z fn() 2025-12-04T13:24:33.6696495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6696558Z method(*args, **kwargs) 2025-12-04T13:24:33.6696740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6696797Z method(*args, **kwargs) 2025-12-04T13:24:33.6696971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6697021Z with policy(): 2025-12-04T13:24:33.6697221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6697281Z raise RuntimeError(msg) 2025-12-04T13:24:33.6697680Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6697682Z 2025-12-04T13:24:33.6697784Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6698044Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6698046Z 2025-12-04T13:24:33.6698163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6698165Z 2025-12-04T13:24:33.6698167Z 2025-12-04T13:24:33.6698275Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6698399Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6698650Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-58a89f40f522ee78.xml - 2025-12-04T13:24:33.6698736Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6699009Z FAILED [7.8141s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6699103Z Traceback (most recent call last): 2025-12-04T13:24:33.6699292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6699346Z getattr(self, test_name)() 2025-12-04T13:24:33.6699531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6699573Z fn() 2025-12-04T13:24:33.6699818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6699868Z method(*args, **kwargs) 2025-12-04T13:24:33.6700042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6700094Z method(*args, **kwargs) 2025-12-04T13:24:33.6700268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6700329Z with policy(): 2025-12-04T13:24:33.6700509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6700562Z raise RuntimeError(msg) 2025-12-04T13:24:33.6700959Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6700962Z 2025-12-04T13:24:33.6701054Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6701335Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6701338Z 2025-12-04T13:24:33.6701454Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6701528Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6701614Z ======================= 1 failed, 20 deselected in 7.97s ======================= 2025-12-04T13:24:33.6701698Z Got exit code 1 2025-12-04T13:24:33.6701930Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda 2025-12-04T13:24:33.6702075Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.6702289Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5de6d646461dfeb7.xml 2025-12-04T13:24:33.6702367Z ============================= test session starts ============================== 2025-12-04T13:24:33.6702505Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6702553Z cachedir: .pytest_cache 2025-12-04T13:24:33.6702751Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6702822Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6702892Z configfile: pytest.ini 2025-12-04T13:24:33.6703092Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6720183Z collecting ... collected 60 items / 2 deselected / 58 selected 2025-12-04T13:24:33.6720254Z stepcurrent: skipping 2 already run items. 2025-12-04T13:24:33.6720305Z Running 19 items in this shard 2025-12-04T13:24:33.6720307Z 2025-12-04T13:24:33.6720650Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 13:03:50.822000 437502 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 437571 2025-12-04T13:24:33.6720815Z I1204 13:03:50.823000 437502 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 437572 2025-12-04T13:24:33.6720975Z I1204 13:03:50.824000 437502 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 437573 2025-12-04T13:24:33.6721928Z I1204 13:03:50.824000 437502 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 437574 2025-12-04T13:24:33.6722296Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6722351Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6722650Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6722717Z {} 2025-12-04T13:24:33.6722827Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6722907Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6723411Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6723476Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6723839Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6723889Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6724248Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6724317Z {} 2025-12-04T13:24:33.6724422Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6724499Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6724991Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6725055Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6725427Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6725501Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6725788Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6725853Z {} 2025-12-04T13:24:33.6725957Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6726033Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6726528Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6726590Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6726945Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6726994Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6727283Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6727349Z {} 2025-12-04T13:24:33.6727456Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6727528Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6728017Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6728079Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6728230Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6728418Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6728720Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6728881Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6729169Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6729299Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6729585Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6729803Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6730089Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6730239Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6730520Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6730662Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6730947Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6731100Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6731602Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6731725Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6731925Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6732311Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6732428Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6732644Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6732840Z [rank3]:E1204 13:03:56.952000 437574 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6732884Z dist init r=3, world=4 2025-12-04T13:24:33.6733028Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6733189Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6733484Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6733639Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6733932Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6734085Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6734367Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6734516Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6734794Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6734944Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6735222Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6735360Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6735639Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6735791Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6736288Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6736406Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6736604Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6736979Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6737114Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6737329Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6737497Z [rank0]:E1204 13:03:56.960000 437571 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6737538Z dist init r=0, world=4 2025-12-04T13:24:33.6737676Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6737837Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6738128Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6738294Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6738590Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6738716Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6738994Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6739145Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6739425Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6739573Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6739892Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6740030Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6740312Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6740461Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6740956Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6741072Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6741268Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6741667Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6741782Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6741996Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6742162Z [rank1]:E1204 13:03:56.962000 437572 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6742200Z dist init r=1, world=4 2025-12-04T13:24:33.6742340Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6742520Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6742823Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6742977Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6743265Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6743389Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6743670Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6743822Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6744097Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6744250Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6744526Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6744666Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6744946Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6745098Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6745593Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6745727Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6745926Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6746299Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6746413Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6746624Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6746790Z [rank2]:E1204 13:03:57.005000 437573 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6746851Z dist init r=2, world=4 2025-12-04T13:24:33.6747189Z [rank0]:[W1204 13:03:57.643680346 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6747230Z FAILED [7.8148s] [ 5%] 2025-12-04T13:24:33.6747233Z 2025-12-04T13:24:33.6747292Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6747406Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _ 2025-12-04T13:24:33.6747453Z Traceback (most recent call last): 2025-12-04T13:24:33.6747621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6747666Z self._join_processes(fn) 2025-12-04T13:24:33.6747841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6747897Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6748076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6748121Z raise RuntimeError(error) 2025-12-04T13:24:33.6748203Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6748249Z Traceback (most recent call last): 2025-12-04T13:24:33.6748412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6748455Z getattr(self, test_name)() 2025-12-04T13:24:33.6748616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6748654Z fn() 2025-12-04T13:24:33.6748809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6748850Z method(*args, **kwargs) 2025-12-04T13:24:33.6749002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6749043Z method(*args, **kwargs) 2025-12-04T13:24:33.6749193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6749232Z with policy(): 2025-12-04T13:24:33.6749385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6749428Z raise RuntimeError(msg) 2025-12-04T13:24:33.6749867Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6749872Z 2025-12-04T13:24:33.6749951Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6750203Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6750206Z 2025-12-04T13:24:33.6750309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6750311Z 2025-12-04T13:24:33.6750313Z 2025-12-04T13:24:33.6750393Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6750481Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6750721Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5de6d646461dfeb7.xml - 2025-12-04T13:24:33.6750808Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6751074Z FAILED [7.8148s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6751120Z Traceback (most recent call last): 2025-12-04T13:24:33.6751290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6751332Z getattr(self, test_name)() 2025-12-04T13:24:33.6751495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6751530Z fn() 2025-12-04T13:24:33.6751686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6751728Z method(*args, **kwargs) 2025-12-04T13:24:33.6751881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6751921Z method(*args, **kwargs) 2025-12-04T13:24:33.6752073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6752110Z with policy(): 2025-12-04T13:24:33.6752263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6752304Z raise RuntimeError(msg) 2025-12-04T13:24:33.6752674Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6752678Z 2025-12-04T13:24:33.6752754Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6753002Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6753004Z 2025-12-04T13:24:33.6753093Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6753156Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6753222Z ======================= 1 failed, 2 deselected in 7.96s ======================== 2025-12-04T13:24:33.6753260Z Got exit code 1 2025-12-04T13:24:33.6753302Z Retrying single test... 2025-12-04T13:24:33.6753492Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e8c0c54162ac7d55.xml 2025-12-04T13:24:33.6753579Z ============================= test session starts ============================== 2025-12-04T13:24:33.6753695Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6753738Z cachedir: .pytest_cache 2025-12-04T13:24:33.6753897Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6753946Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6753986Z configfile: pytest.ini 2025-12-04T13:24:33.6754151Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6754225Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6754469Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6754527Z Running 1 items in this shard 2025-12-04T13:24:33.6754539Z 2025-12-04T13:24:33.6754862Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 13:04:01.288000 437888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 437957 2025-12-04T13:24:33.6755020Z I1204 13:04:01.289000 437888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 437958 2025-12-04T13:24:33.6755173Z I1204 13:04:01.290000 437888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 437959 2025-12-04T13:24:33.6755326Z I1204 13:04:01.291000 437888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 437960 2025-12-04T13:24:33.6755689Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6755741Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6756033Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6756100Z {} 2025-12-04T13:24:33.6756205Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6756280Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6756779Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6756843Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6757201Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6757249Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6757540Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6757602Z {} 2025-12-04T13:24:33.6757734Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6757809Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6758297Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6758359Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6758711Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6758759Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6759057Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6759131Z {} 2025-12-04T13:24:33.6759233Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6759306Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6759832Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6759893Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6760254Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6760301Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6760588Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6760648Z {} 2025-12-04T13:24:33.6760751Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6760821Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6761312Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6761373Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6761518Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6761681Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6761970Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6762156Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6762445Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6762572Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6762851Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6763002Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6763282Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6763454Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6763733Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6763869Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6764149Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6764300Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6764798Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6764918Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6765115Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6765493Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6765610Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6765823Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6765989Z [rank1]:E1204 13:04:07.350000 437958 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6766028Z dist init r=1, world=4 2025-12-04T13:24:33.6766169Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6766347Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6766638Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6766791Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6767079Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6767204Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6767485Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6767653Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6767933Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6768082Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6768358Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6768496Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6768777Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6768926Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6769420Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6769536Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6769785Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6770159Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6770274Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6770486Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6770677Z [rank3]:E1204 13:04:07.360000 437960 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6770719Z dist init r=3, world=4 2025-12-04T13:24:33.6770858Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6771019Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6771305Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6771460Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6771746Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6771902Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6772178Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6772327Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6772607Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6772754Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6773035Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6773171Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6773451Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6773598Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6774094Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6774211Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6774406Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6774781Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6774894Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6775136Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6775301Z [rank2]:E1204 13:04:07.364000 437959 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6775343Z dist init r=2, world=4 2025-12-04T13:24:33.6775480Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6775640Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6775927Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6776091Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6776387Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6776511Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6776789Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6776936Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6777217Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6777366Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6777643Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6777779Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6778057Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6778208Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6778702Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6778817Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6779013Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6779406Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6779523Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6779780Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6779946Z [rank0]:E1204 13:04:07.366000 437957 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6779985Z dist init r=0, world=4 2025-12-04T13:24:33.6780329Z [rank0]:[W1204 13:04:07.083942389 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6780400Z FAILED [7.8140s] [100%] 2025-12-04T13:24:33.6780403Z 2025-12-04T13:24:33.6780460Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6780575Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _ 2025-12-04T13:24:33.6780621Z Traceback (most recent call last): 2025-12-04T13:24:33.6780785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6780829Z self._join_processes(fn) 2025-12-04T13:24:33.6781003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6781056Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6781236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6781281Z raise RuntimeError(error) 2025-12-04T13:24:33.6781361Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.6781405Z Traceback (most recent call last): 2025-12-04T13:24:33.6781567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6781609Z getattr(self, test_name)() 2025-12-04T13:24:33.6781767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6781802Z fn() 2025-12-04T13:24:33.6781954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6781994Z method(*args, **kwargs) 2025-12-04T13:24:33.6782145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6782187Z method(*args, **kwargs) 2025-12-04T13:24:33.6782340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6782379Z with policy(): 2025-12-04T13:24:33.6782532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6782572Z raise RuntimeError(msg) 2025-12-04T13:24:33.6782941Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6782943Z 2025-12-04T13:24:33.6783020Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6783296Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6783300Z 2025-12-04T13:24:33.6783389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6783391Z 2025-12-04T13:24:33.6783449Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6783495Z Traceback (most recent call last): 2025-12-04T13:24:33.6783657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6783701Z getattr(self, test_name)() 2025-12-04T13:24:33.6783860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6783896Z fn() 2025-12-04T13:24:33.6784045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6784099Z method(*args, **kwargs) 2025-12-04T13:24:33.6784263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6784302Z method(*args, **kwargs) 2025-12-04T13:24:33.6784452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6784489Z with policy(): 2025-12-04T13:24:33.6784641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6784682Z raise RuntimeError(msg) 2025-12-04T13:24:33.6785051Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6785054Z 2025-12-04T13:24:33.6785129Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6785378Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6785381Z 2025-12-04T13:24:33.6785468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6785470Z 2025-12-04T13:24:33.6785529Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6785573Z Traceback (most recent call last): 2025-12-04T13:24:33.6785736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6785779Z getattr(self, test_name)() 2025-12-04T13:24:33.6785938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6785975Z fn() 2025-12-04T13:24:33.6786125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6786166Z method(*args, **kwargs) 2025-12-04T13:24:33.6786316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6786357Z method(*args, **kwargs) 2025-12-04T13:24:33.6786506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6786543Z with policy(): 2025-12-04T13:24:33.6786694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6786736Z raise RuntimeError(msg) 2025-12-04T13:24:33.6787123Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6787127Z 2025-12-04T13:24:33.6787201Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6787445Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6787449Z 2025-12-04T13:24:33.6787535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6787537Z 2025-12-04T13:24:33.6787539Z 2025-12-04T13:24:33.6787615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6787703Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6787939Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e8c0c54162ac7d55.xml - 2025-12-04T13:24:33.6788033Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6788298Z FAILED [7.8140s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.6788343Z Traceback (most recent call last): 2025-12-04T13:24:33.6788507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6788548Z getattr(self, test_name)() 2025-12-04T13:24:33.6788709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6788743Z fn() 2025-12-04T13:24:33.6788896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6788937Z method(*args, **kwargs) 2025-12-04T13:24:33.6789088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6789129Z method(*args, **kwargs) 2025-12-04T13:24:33.6789279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6789316Z with policy(): 2025-12-04T13:24:33.6789468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6789509Z raise RuntimeError(msg) 2025-12-04T13:24:33.6789923Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6789926Z 2025-12-04T13:24:33.6790001Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6790248Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6790250Z 2025-12-04T13:24:33.6790339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6790341Z 2025-12-04T13:24:33.6790398Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6790443Z Traceback (most recent call last): 2025-12-04T13:24:33.6790605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6790648Z getattr(self, test_name)() 2025-12-04T13:24:33.6790834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6790871Z fn() 2025-12-04T13:24:33.6791022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6791061Z method(*args, **kwargs) 2025-12-04T13:24:33.6791212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6791251Z method(*args, **kwargs) 2025-12-04T13:24:33.6791401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6791436Z with policy(): 2025-12-04T13:24:33.6791587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6791627Z raise RuntimeError(msg) 2025-12-04T13:24:33.6791994Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6792024Z 2025-12-04T13:24:33.6792096Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6792343Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6792345Z 2025-12-04T13:24:33.6792431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6792435Z 2025-12-04T13:24:33.6792491Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6792537Z Traceback (most recent call last): 2025-12-04T13:24:33.6792699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6792745Z getattr(self, test_name)() 2025-12-04T13:24:33.6792904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6792939Z fn() 2025-12-04T13:24:33.6793089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6793129Z method(*args, **kwargs) 2025-12-04T13:24:33.6793278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6793319Z method(*args, **kwargs) 2025-12-04T13:24:33.6793468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6793505Z with policy(): 2025-12-04T13:24:33.6793657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6793700Z raise RuntimeError(msg) 2025-12-04T13:24:33.6794064Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6794066Z 2025-12-04T13:24:33.6794140Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6794383Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6794388Z 2025-12-04T13:24:33.6794473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6794538Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6794621Z ======================= 1 failed, 20 deselected in 7.97s ======================= 2025-12-04T13:24:33.6794662Z Got exit code 1 2025-12-04T13:24:33.6794702Z Retrying single test... 2025-12-04T13:24:33.6794893Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c14d64351c76376.xml 2025-12-04T13:24:33.6794951Z ============================= test session starts ============================== 2025-12-04T13:24:33.6795066Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6795107Z cachedir: .pytest_cache 2025-12-04T13:24:33.6795266Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6795313Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6795354Z configfile: pytest.ini 2025-12-04T13:24:33.6795518Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6795614Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6795854Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6795900Z Running 1 items in this shard 2025-12-04T13:24:33.6795902Z 2025-12-04T13:24:33.6796224Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 13:04:11.360000 438274 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 438343 2025-12-04T13:24:33.6796379Z I1204 13:04:11.361000 438274 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 438344 2025-12-04T13:24:33.6796535Z I1204 13:04:11.361000 438274 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 438345 2025-12-04T13:24:33.6796685Z I1204 13:04:11.362000 438274 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 438346 2025-12-04T13:24:33.6797046Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6797094Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6797386Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6797448Z {} 2025-12-04T13:24:33.6797555Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6797629Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6798123Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6798185Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6798539Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6798588Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6798893Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6798958Z {} 2025-12-04T13:24:33.6799061Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6799133Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6799622Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6799684Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6800101Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6800162Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6800449Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6800510Z {} 2025-12-04T13:24:33.6800612Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6800682Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6801172Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6801232Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6801588Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.6801636Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.6801925Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T13:24:33.6801989Z {} 2025-12-04T13:24:33.6802090Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T13:24:33.6802162Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T13:24:33.6802648Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6802708Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6802852Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6803044Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6803337Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6803492Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6803781Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6803907Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6804187Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6804362Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6804639Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6804787Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6805063Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6805203Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6805484Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6805633Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6806128Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6806248Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6806446Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6806820Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6806937Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6807150Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6807333Z [rank3]:E1204 13:04:17.471000 438346 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6807375Z dist init r=3, world=4 2025-12-04T13:24:33.6807515Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6807676Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6807965Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6808119Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6808407Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6808551Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6808827Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6808977Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6809252Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6809402Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6809682Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6809863Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6810145Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6810293Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6810788Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200. 2025-12-04T13:24:33.6810905Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6811100Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6811475Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6811615Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6811830Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6811993Z [rank1]:E1204 13:04:17.487000 438344 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6812033Z dist init r=1, world=4 2025-12-04T13:24:33.6812172Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6812333Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6812624Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6812793Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6813094Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6813219Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6813497Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6813644Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6813923Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6814070Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6814348Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6814485Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6814764Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6814915Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6815410Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080. 2025-12-04T13:24:33.6815526Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6815721Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6816114Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6816230Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6816442Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6816606Z [rank0]:E1204 13:04:17.500000 438343 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6816644Z dist init r=0, world=4 2025-12-04T13:24:33.6816783Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6816953Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6817252Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6817405Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6817694Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6817818Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6818096Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6818245Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6818521Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6818668Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6818944Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6819084Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6819364Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6819511Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6820041Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984. 2025-12-04T13:24:33.6820187Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6820385Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6820760Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6820873Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6821086Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6821250Z [rank2]:E1204 13:04:17.567000 438345 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6821316Z dist init r=2, world=4 2025-12-04T13:24:33.6821652Z [rank0]:[W1204 13:04:17.220302305 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6821693Z FAILED [7.8134s] [100%] 2025-12-04T13:24:33.6821695Z 2025-12-04T13:24:33.6821751Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6821867Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _ 2025-12-04T13:24:33.6821913Z Traceback (most recent call last): 2025-12-04T13:24:33.6822078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6822123Z self._join_processes(fn) 2025-12-04T13:24:33.6822299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6822354Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6822532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6822577Z raise RuntimeError(error) 2025-12-04T13:24:33.6822656Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6822702Z Traceback (most recent call last): 2025-12-04T13:24:33.6822862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6822907Z getattr(self, test_name)() 2025-12-04T13:24:33.6823065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6823102Z fn() 2025-12-04T13:24:33.6823376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6823418Z method(*args, **kwargs) 2025-12-04T13:24:33.6823569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6823611Z method(*args, **kwargs) 2025-12-04T13:24:33.6823762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6823803Z with policy(): 2025-12-04T13:24:33.6823955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6823997Z raise RuntimeError(msg) 2025-12-04T13:24:33.6824392Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6824398Z 2025-12-04T13:24:33.6824474Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6824723Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6824725Z 2025-12-04T13:24:33.6824813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6824815Z 2025-12-04T13:24:33.6824816Z 2025-12-04T13:24:33.6824893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6824981Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6825217Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c14d64351c76376.xml - 2025-12-04T13:24:33.6825301Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6825565Z FAILED [7.8134s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6825610Z Traceback (most recent call last): 2025-12-04T13:24:33.6825774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6825819Z getattr(self, test_name)() 2025-12-04T13:24:33.6825978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6826014Z fn() 2025-12-04T13:24:33.6826168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6826210Z method(*args, **kwargs) 2025-12-04T13:24:33.6826362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6826403Z method(*args, **kwargs) 2025-12-04T13:24:33.6826552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6826590Z with policy(): 2025-12-04T13:24:33.6826741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6826783Z raise RuntimeError(msg) 2025-12-04T13:24:33.6827154Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336. 2025-12-04T13:24:33.6827158Z 2025-12-04T13:24:33.6827234Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6827479Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6827481Z 2025-12-04T13:24:33.6827568Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6827632Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6827693Z ======================= 1 failed, 20 deselected in 7.97s ======================= 2025-12-04T13:24:33.6827731Z Got exit code 1 2025-12-04T13:24:33.6827927Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T13:24:33.6828078Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.6828269Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9cee0414ee9bd595.xml 2025-12-04T13:24:33.6828329Z ============================= test session starts ============================== 2025-12-04T13:24:33.6828439Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6828482Z cachedir: .pytest_cache 2025-12-04T13:24:33.6828639Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6828687Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6828727Z configfile: pytest.ini 2025-12-04T13:24:33.6828889Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6828975Z collecting ... collected 60 items / 3 deselected / 57 selected 2025-12-04T13:24:33.6829041Z stepcurrent: skipping 3 already run items. 2025-12-04T13:24:33.6829084Z Running 18 items in this shard 2025-12-04T13:24:33.6829086Z 2025-12-04T13:24:33.6829398Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 13:04:21.736000 438660 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 438729 2025-12-04T13:24:33.6829554Z I1204 13:04:21.736000 438660 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 438730 2025-12-04T13:24:33.6829753Z I1204 13:04:21.737000 438660 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 438731 2025-12-04T13:24:33.6829906Z I1204 13:04:21.737000 438660 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 438732 2025-12-04T13:24:33.6830491Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6830533Z _warn_cpu_init() 2025-12-04T13:24:33.6831103Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6831143Z _warn_cpu_init() 2025-12-04T13:24:33.6831714Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6831752Z _warn_cpu_init() 2025-12-04T13:24:33.6832046Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.6832088Z return func(*args, **kwargs) 2025-12-04T13:24:33.6832685Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6832725Z _warn_cpu_init() 2025-12-04T13:24:33.6832869Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6833033Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6833323Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6833495Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6833800Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6833925Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6834202Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6834351Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6834632Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6834782Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6835060Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6835196Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6835476Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6835626Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6836117Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.6836234Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6836428Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6836817Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6836934Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6837148Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6837311Z [rank2]:E1204 13:05:17.394000 438731 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6837351Z dist init r=2, world=4 2025-12-04T13:24:33.6837489Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6837652Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6837951Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6838117Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6838408Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6838533Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6838812Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6838961Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6839240Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6839388Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6839666Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6839839Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6840119Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6840269Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6840751Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.6840867Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6841092Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6841452Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6841567Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6841778Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6841943Z [rank0]:E1204 13:05:17.438000 438729 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6841996Z dist init r=0, world=4 2025-12-04T13:24:33.6842136Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6842307Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6842596Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6842750Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6843035Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6843162Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6843439Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6843587Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6843864Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6844012Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6844295Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6844432Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6844711Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6844858Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6845358Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.6845475Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6845671Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6846030Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6846143Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6846358Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6846541Z [rank3]:E1204 13:05:17.439000 438732 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6846582Z dist init r=3, world=4 2025-12-04T13:24:33.6846719Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6846879Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6847168Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6847322Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6847610Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6847734Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6848012Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6848158Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6848437Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6848584Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6848861Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6848998Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6849278Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6849447Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6850210Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.6850326Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6850522Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6850882Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6851030Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6851241Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6851407Z [rank1]:E1204 13:05:17.454000 438730 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6851445Z dist init r=1, world=4 2025-12-04T13:24:33.6851785Z [rank0]:[W1204 13:05:17.181356392 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6851827Z FAILED [57.5568s] [ 5%] 2025-12-04T13:24:33.6851831Z 2025-12-04T13:24:33.6851888Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6851992Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___ 2025-12-04T13:24:33.6852040Z Traceback (most recent call last): 2025-12-04T13:24:33.6852205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6852248Z self._join_processes(fn) 2025-12-04T13:24:33.6852423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6852476Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6852655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6852701Z raise RuntimeError(error) 2025-12-04T13:24:33.6852785Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6852830Z Traceback (most recent call last): 2025-12-04T13:24:33.6852993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6853035Z getattr(self, test_name)() 2025-12-04T13:24:33.6853194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6853229Z fn() 2025-12-04T13:24:33.6853381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6853422Z method(*args, **kwargs) 2025-12-04T13:24:33.6853574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6853614Z method(*args, **kwargs) 2025-12-04T13:24:33.6853791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6853830Z with policy(): 2025-12-04T13:24:33.6853984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6854024Z raise RuntimeError(msg) 2025-12-04T13:24:33.6854383Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.6854385Z 2025-12-04T13:24:33.6854461Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6854694Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6854707Z 2025-12-04T13:24:33.6854797Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6854814Z 2025-12-04T13:24:33.6854816Z 2025-12-04T13:24:33.6854890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6854979Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6855212Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9cee0414ee9bd595.xml - 2025-12-04T13:24:33.6855273Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6855524Z FAILED [57.5568s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6855571Z Traceback (most recent call last): 2025-12-04T13:24:33.6855739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6855781Z getattr(self, test_name)() 2025-12-04T13:24:33.6855942Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6855977Z fn() 2025-12-04T13:24:33.6856129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6856168Z method(*args, **kwargs) 2025-12-04T13:24:33.6856320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6856359Z method(*args, **kwargs) 2025-12-04T13:24:33.6856512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6856550Z with policy(): 2025-12-04T13:24:33.6856704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6856744Z raise RuntimeError(msg) 2025-12-04T13:24:33.6857099Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.6857101Z 2025-12-04T13:24:33.6857175Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6857407Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6857410Z 2025-12-04T13:24:33.6857497Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6857579Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6857643Z ======================= 1 failed, 3 deselected in 57.71s ======================= 2025-12-04T13:24:33.6857681Z Got exit code 1 2025-12-04T13:24:33.6857723Z Retrying single test... 2025-12-04T13:24:33.6857910Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-196977a439cbea99.xml 2025-12-04T13:24:33.6857967Z ============================= test session starts ============================== 2025-12-04T13:24:33.6858081Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6858123Z cachedir: .pytest_cache 2025-12-04T13:24:33.6858283Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6858331Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6858382Z configfile: pytest.ini 2025-12-04T13:24:33.6858546Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6858632Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6858857Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6858901Z Running 1 items in this shard 2025-12-04T13:24:33.6858903Z 2025-12-04T13:24:33.6859212Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 13:05:21.670000 439062 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 439131 2025-12-04T13:24:33.6859365Z I1204 13:05:21.671000 439062 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 439132 2025-12-04T13:24:33.6859522Z I1204 13:05:21.671000 439062 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 439133 2025-12-04T13:24:33.6859676Z I1204 13:05:21.672000 439062 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 439134 2025-12-04T13:24:33.6860291Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6860331Z _warn_cpu_init() 2025-12-04T13:24:33.6860899Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6860939Z _warn_cpu_init() 2025-12-04T13:24:33.6861505Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6861541Z _warn_cpu_init() 2025-12-04T13:24:33.6861863Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.6861908Z return func(*args, **kwargs) 2025-12-04T13:24:33.6862477Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6862514Z _warn_cpu_init() 2025-12-04T13:24:33.6862657Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6862821Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6863121Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6863293Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6863579Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6863706Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6863985Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6864136Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6864416Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6864563Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6864842Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6864980Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6865260Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6865408Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6865893Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.6866010Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6866225Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6866587Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6866701Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6866916Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6867079Z [rank0]:E1204 13:06:17.473000 439131 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6867137Z dist init r=0, world=4 2025-12-04T13:24:33.6867275Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6867449Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6867736Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6867889Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6868175Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6868301Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6868579Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6868725Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6869004Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6869152Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6869429Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6869567Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6869878Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6870027Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6870533Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.6870652Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6870848Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6871206Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6871321Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6871535Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6871729Z [rank2]:E1204 13:06:17.524000 439133 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6871867Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6872027Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6872314Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6872466Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6872755Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6872878Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6873156Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6873303Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6873582Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6873730Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6874008Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6874146Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6874423Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6874572Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6875073Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.6875189Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6875385Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6875743Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6875877Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6876087Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6876251Z [rank1]:E1204 13:06:17.525000 439132 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6876289Z dist init r=2, world=4 2025-12-04T13:24:33.6876329Z dist init r=1, world=4 2025-12-04T13:24:33.6876467Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6876628Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6876915Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6877070Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6877355Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6877479Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6877756Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6877906Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6878185Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6878330Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6878607Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6878744Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6879043Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6879194Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6879672Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.6879830Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6880027Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6880412Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6880525Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6880735Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6880900Z [rank3]:E1204 13:06:17.526000 439134 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6880938Z dist init r=3, world=4 2025-12-04T13:24:33.6881281Z [rank0]:[W1204 13:06:17.161632667 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6881322Z FAILED [57.6576s] [100%] 2025-12-04T13:24:33.6881325Z 2025-12-04T13:24:33.6881381Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6881482Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___ 2025-12-04T13:24:33.6881529Z Traceback (most recent call last): 2025-12-04T13:24:33.6881692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6881735Z self._join_processes(fn) 2025-12-04T13:24:33.6881909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6881964Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6882143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6882187Z raise RuntimeError(error) 2025-12-04T13:24:33.6882268Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.6882312Z Traceback (most recent call last): 2025-12-04T13:24:33.6882474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6882516Z getattr(self, test_name)() 2025-12-04T13:24:33.6882674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6882708Z fn() 2025-12-04T13:24:33.6882860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6882933Z method(*args, **kwargs) 2025-12-04T13:24:33.6883088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6883128Z method(*args, **kwargs) 2025-12-04T13:24:33.6883279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6883316Z with policy(): 2025-12-04T13:24:33.6883471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6883513Z raise RuntimeError(msg) 2025-12-04T13:24:33.6883965Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.6883985Z 2025-12-04T13:24:33.6884063Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6884306Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6884308Z 2025-12-04T13:24:33.6884396Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6884398Z 2025-12-04T13:24:33.6884400Z 2025-12-04T13:24:33.6884474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6884563Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6884794Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-196977a439cbea99.xml - 2025-12-04T13:24:33.6884856Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6885110Z FAILED [57.6576s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.6885158Z Traceback (most recent call last): 2025-12-04T13:24:33.6885322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6885364Z getattr(self, test_name)() 2025-12-04T13:24:33.6885524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6885558Z fn() 2025-12-04T13:24:33.6885711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6885750Z method(*args, **kwargs) 2025-12-04T13:24:33.6885904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6885945Z method(*args, **kwargs) 2025-12-04T13:24:33.6886096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6886133Z with policy(): 2025-12-04T13:24:33.6886285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6886325Z raise RuntimeError(msg) 2025-12-04T13:24:33.6886678Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.6886681Z 2025-12-04T13:24:33.6886755Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6887008Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6887012Z 2025-12-04T13:24:33.6887101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6887162Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6887227Z ====================== 1 failed, 20 deselected in 57.82s ======================= 2025-12-04T13:24:33.6887264Z Got exit code 1 2025-12-04T13:24:33.6887305Z Retrying single test... 2025-12-04T13:24:33.6887493Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1f9bd8cc6463ee93.xml 2025-12-04T13:24:33.6887552Z ============================= test session starts ============================== 2025-12-04T13:24:33.6887665Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6887719Z cachedir: .pytest_cache 2025-12-04T13:24:33.6887878Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6887938Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6887978Z configfile: pytest.ini 2025-12-04T13:24:33.6888140Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6888214Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.6888440Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6888484Z Running 1 items in this shard 2025-12-04T13:24:33.6888486Z 2025-12-04T13:24:33.6888797Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 13:06:21.827000 439464 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 439533 2025-12-04T13:24:33.6888953Z I1204 13:06:21.828000 439464 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 439534 2025-12-04T13:24:33.6889105Z I1204 13:06:21.828000 439464 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 439535 2025-12-04T13:24:33.6889257Z I1204 13:06:21.829000 439464 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 439536 2025-12-04T13:24:33.6889882Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6889923Z _warn_cpu_init() 2025-12-04T13:24:33.6890487Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6890526Z _warn_cpu_init() 2025-12-04T13:24:33.6891118Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6891157Z _warn_cpu_init() 2025-12-04T13:24:33.6891720Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6891756Z _warn_cpu_init() 2025-12-04T13:24:33.6892047Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.6892106Z return func(*args, **kwargs) 2025-12-04T13:24:33.6892250Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6892431Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6892720Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6892876Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6893161Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6893289Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6893567Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6893717Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6893994Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6894141Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6894422Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6894558Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6894837Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6894985Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6895527Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.6895646Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6895841Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6896200Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6896313Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6896528Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6896713Z [rank2]:E1204 13:07:17.619000 439535 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6896752Z dist init r=2, world=4 2025-12-04T13:24:33.6896890Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6897051Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6897338Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6897493Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6897780Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6897904Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6898180Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6898327Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6898606Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6898757Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6899032Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6899168Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6899446Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6899618Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6900143Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.6900259Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6900456Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6900819Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6900965Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6901176Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6901342Z [rank0]:E1204 13:07:17.662000 439533 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6901380Z dist init r=0, world=4 2025-12-04T13:24:33.6901519Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6901677Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6901966Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6902121Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6902405Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6902530Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6902806Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6902957Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6903234Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6903381Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6903658Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6903794Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6904097Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6904246Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6904726Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.6906847Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6907057Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6907450Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6907565Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6907779Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6907943Z [rank3]:E1204 13:07:17.665000 439536 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6907997Z dist init r=3, world=4 2025-12-04T13:24:33.6908137Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6908299Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6908586Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6908741Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6909026Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6909153Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6909434Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6909589Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6909903Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6910050Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6910345Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6910484Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6910762Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6910911Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6911460Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.6911604Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6911799Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6912158Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6912269Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6912485Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6912651Z [rank1]:E1204 13:07:17.670000 439534 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6912690Z dist init r=1, world=4 2025-12-04T13:24:33.6913028Z [rank0]:[W1204 13:07:17.413278229 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6913069Z FAILED [57.8583s] [100%] 2025-12-04T13:24:33.6913071Z 2025-12-04T13:24:33.6913130Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6913230Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___ 2025-12-04T13:24:33.6913279Z Traceback (most recent call last): 2025-12-04T13:24:33.6913443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6913489Z self._join_processes(fn) 2025-12-04T13:24:33.6913661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6913716Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6913895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6913940Z raise RuntimeError(error) 2025-12-04T13:24:33.6914019Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.6914066Z Traceback (most recent call last): 2025-12-04T13:24:33.6914227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6914271Z getattr(self, test_name)() 2025-12-04T13:24:33.6914446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6914482Z fn() 2025-12-04T13:24:33.6914636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6914677Z method(*args, **kwargs) 2025-12-04T13:24:33.6914831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6914870Z method(*args, **kwargs) 2025-12-04T13:24:33.6915021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6915058Z with policy(): 2025-12-04T13:24:33.6915228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6915269Z raise RuntimeError(msg) 2025-12-04T13:24:33.6915650Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.6915663Z 2025-12-04T13:24:33.6915737Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6915971Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6915974Z 2025-12-04T13:24:33.6916060Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6916064Z 2025-12-04T13:24:33.6916122Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6916168Z Traceback (most recent call last): 2025-12-04T13:24:33.6916332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6916377Z getattr(self, test_name)() 2025-12-04T13:24:33.6916535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6916570Z fn() 2025-12-04T13:24:33.6916719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6916761Z method(*args, **kwargs) 2025-12-04T13:24:33.6916910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6916951Z method(*args, **kwargs) 2025-12-04T13:24:33.6917101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6917140Z with policy(): 2025-12-04T13:24:33.6917291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6917334Z raise RuntimeError(msg) 2025-12-04T13:24:33.6917687Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.6917690Z 2025-12-04T13:24:33.6917764Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6917997Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6918000Z 2025-12-04T13:24:33.6918086Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6918089Z 2025-12-04T13:24:33.6918149Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6918204Z Traceback (most recent call last): 2025-12-04T13:24:33.6918368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6918410Z getattr(self, test_name)() 2025-12-04T13:24:33.6918570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6918604Z fn() 2025-12-04T13:24:33.6918755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6918794Z method(*args, **kwargs) 2025-12-04T13:24:33.6918944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6918983Z method(*args, **kwargs) 2025-12-04T13:24:33.6919147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6919206Z with policy(): 2025-12-04T13:24:33.6919359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6919399Z raise RuntimeError(msg) 2025-12-04T13:24:33.6919786Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.6919788Z 2025-12-04T13:24:33.6919862Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6920093Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6920097Z 2025-12-04T13:24:33.6920185Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6920190Z 2025-12-04T13:24:33.6920191Z 2025-12-04T13:24:33.6920267Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.6920356Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.6920589Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1f9bd8cc6463ee93.xml - 2025-12-04T13:24:33.6920651Z =========================== short test summary info ============================ 2025-12-04T13:24:33.6920905Z FAILED [57.8583s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.6920952Z Traceback (most recent call last): 2025-12-04T13:24:33.6921116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6921160Z getattr(self, test_name)() 2025-12-04T13:24:33.6921320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6921355Z fn() 2025-12-04T13:24:33.6921506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6921546Z method(*args, **kwargs) 2025-12-04T13:24:33.6921695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6921734Z method(*args, **kwargs) 2025-12-04T13:24:33.6921883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6921920Z with policy(): 2025-12-04T13:24:33.6922088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6922130Z raise RuntimeError(msg) 2025-12-04T13:24:33.6922483Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.6922485Z 2025-12-04T13:24:33.6922557Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6922788Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6922791Z 2025-12-04T13:24:33.6922876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6922896Z 2025-12-04T13:24:33.6922954Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6923017Z Traceback (most recent call last): 2025-12-04T13:24:33.6923195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6923237Z getattr(self, test_name)() 2025-12-04T13:24:33.6923396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6923431Z fn() 2025-12-04T13:24:33.6923580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6923621Z method(*args, **kwargs) 2025-12-04T13:24:33.6923770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6923810Z method(*args, **kwargs) 2025-12-04T13:24:33.6923960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6924000Z with policy(): 2025-12-04T13:24:33.6924151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6924192Z raise RuntimeError(msg) 2025-12-04T13:24:33.6924545Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.6924547Z 2025-12-04T13:24:33.6924621Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6924852Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6924857Z 2025-12-04T13:24:33.6924942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6924945Z 2025-12-04T13:24:33.6925005Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.6925051Z Traceback (most recent call last): 2025-12-04T13:24:33.6925214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6925255Z getattr(self, test_name)() 2025-12-04T13:24:33.6925417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6925451Z fn() 2025-12-04T13:24:33.6925603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6925641Z method(*args, **kwargs) 2025-12-04T13:24:33.6925793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6925844Z method(*args, **kwargs) 2025-12-04T13:24:33.6925995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6926031Z with policy(): 2025-12-04T13:24:33.6926183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6926223Z raise RuntimeError(msg) 2025-12-04T13:24:33.6926576Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.6926578Z 2025-12-04T13:24:33.6926651Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6926892Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6926914Z 2025-12-04T13:24:33.6927000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6927063Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.6927127Z ====================== 1 failed, 20 deselected in 58.02s ======================= 2025-12-04T13:24:33.6927164Z Got exit code 1 2025-12-04T13:24:33.6927347Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda 2025-12-04T13:24:33.6927475Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.6927666Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-613551715bb88517.xml 2025-12-04T13:24:33.6927725Z ============================= test session starts ============================== 2025-12-04T13:24:33.6927839Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.6927880Z cachedir: .pytest_cache 2025-12-04T13:24:33.6928040Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.6928086Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.6928127Z configfile: pytest.ini 2025-12-04T13:24:33.6928288Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.6928361Z collecting ... collected 60 items / 4 deselected / 56 selected 2025-12-04T13:24:33.6928416Z stepcurrent: skipping 4 already run items. 2025-12-04T13:24:33.6928459Z Running 17 items in this shard 2025-12-04T13:24:33.6928461Z 2025-12-04T13:24:33.6928782Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 13:07:22.043000 439866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 439935 2025-12-04T13:24:33.6928939Z I1204 13:07:22.044000 439866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 439936 2025-12-04T13:24:33.6929093Z I1204 13:07:22.045000 439866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 439937 2025-12-04T13:24:33.6929242Z I1204 13:07:22.046000 439866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 439938 2025-12-04T13:24:33.6930057Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6930097Z _warn_cpu_init() 2025-12-04T13:24:33.6930404Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.6930443Z _init_core_state( 2025-12-04T13:24:33.6930934Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6931018Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6931604Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6931656Z _warn_cpu_init() 2025-12-04T13:24:33.6931954Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.6931992Z _init_core_state( 2025-12-04T13:24:33.6932483Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6932545Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6933116Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6933152Z _warn_cpu_init() 2025-12-04T13:24:33.6933451Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.6933490Z _init_core_state( 2025-12-04T13:24:33.6933977Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6934036Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6934614Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.6934654Z _warn_cpu_init() 2025-12-04T13:24:33.6935142Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6935200Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6935697Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6935776Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6936075Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.6936111Z _init_core_state( 2025-12-04T13:24:33.6936597Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6936655Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6937141Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.6937199Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.6938483Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.6938613Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.6939928Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.6940054Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.6941327Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.6941475Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.6942738Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.6942860Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.6943091Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.6943135Z return func(*args, **kwargs) 2025-12-04T13:24:33.6943359Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.6943401Z return func(*args, **kwargs) 2025-12-04T13:24:33.6943622Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.6943664Z return func(*args, **kwargs) 2025-12-04T13:24:33.6943895Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.6943938Z return func(*args, **kwargs) 2025-12-04T13:24:33.6944161Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.6944201Z return func(*args, **kwargs) 2025-12-04T13:24:33.6944419Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.6944459Z return func(*args, **kwargs) 2025-12-04T13:24:33.6944677Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.6944717Z return func(*args, **kwargs) 2025-12-04T13:24:33.6944947Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.6945010Z return func(*args, **kwargs) 2025-12-04T13:24:33.6945301Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.6945342Z return func(*args, **kwargs) 2025-12-04T13:24:33.6945488Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6945651Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6945945Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6946104Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6946395Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6946522Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6946802Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6946954Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6947233Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6947382Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6947659Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6947796Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6948092Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6948242Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6948738Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.6948853Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6949051Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6949433Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.6955852Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6956102Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6956274Z [rank2]:E1204 13:07:30.546000 439937 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.6956317Z dist init r=2, world=4 2025-12-04T13:24:33.6956465Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6956631Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6956929Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6957088Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6957380Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6957511Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6957796Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6957951Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6958231Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6958381Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6958661Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6958859Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6959141Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6959293Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6959881Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17500733440. 2025-12-04T13:24:33.6960004Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6960234Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6960609Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.6960726Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6960941Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6961115Z [rank1]:E1204 13:07:30.565000 439936 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.6961159Z dist init r=1, world=4 2025-12-04T13:24:33.6961301Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6961461Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6961753Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6961909Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6962199Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6962329Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6962610Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6962762Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6963041Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6963191Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6963483Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6963623Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6963904Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6964054Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6964558Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17637048320. 2025-12-04T13:24:33.6964698Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6964898Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6965267Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.6965383Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6965598Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6965764Z [rank0]:E1204 13:07:30.606000 439935 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.6965805Z dist init r=0, world=4 2025-12-04T13:24:33.6965943Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.6966105Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.6966393Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6966552Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.6966840Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6966969Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.6967250Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6967400Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6967691Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6967841Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.6968119Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6968255Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.6968548Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6968711Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.6969212Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17433624576. 2025-12-04T13:24:33.6969329Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6969527Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6969940Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.6970057Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.6970271Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6970438Z [rank3]:E1204 13:07:30.611000 439938 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.6970478Z dist init r=3, world=4 2025-12-04T13:24:33.6970823Z [rank1]:[W1204 13:07:30.224177360 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6971157Z [rank2]:[W1204 13:07:30.226746393 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6971500Z [rank0]:[W1204 13:07:30.346699558 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6973700Z [rank3]:[W1204 13:07:30.349183263 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.6973751Z FAILED [22.8253s] [ 5%] 2025-12-04T13:24:33.6973756Z 2025-12-04T13:24:33.6973834Z =================================== FAILURES =================================== 2025-12-04T13:24:33.6973943Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.6973991Z Traceback (most recent call last): 2025-12-04T13:24:33.6974161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.6974207Z self._join_processes(fn) 2025-12-04T13:24:33.6974383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.6974438Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.6974617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.6974677Z raise RuntimeError(error) 2025-12-04T13:24:33.6974783Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.6974844Z Traceback (most recent call last): 2025-12-04T13:24:33.6975004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6975048Z getattr(self, test_name)() 2025-12-04T13:24:33.6975208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6995815Z fn() 2025-12-04T13:24:33.6995968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6996010Z method(*args, **kwargs) 2025-12-04T13:24:33.6996162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6996203Z method(*args, **kwargs) 2025-12-04T13:24:33.6996355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6996395Z with policy(): 2025-12-04T13:24:33.6996547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6996587Z raise RuntimeError(msg) 2025-12-04T13:24:33.6996951Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17637048320. 2025-12-04T13:24:33.6996954Z 2025-12-04T13:24:33.6997032Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6997278Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.6997282Z 2025-12-04T13:24:33.6997372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6997375Z 2025-12-04T13:24:33.6997434Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.6997479Z Traceback (most recent call last): 2025-12-04T13:24:33.6997643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6997685Z getattr(self, test_name)() 2025-12-04T13:24:33.6997842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.6997877Z fn() 2025-12-04T13:24:33.6998027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6998068Z method(*args, **kwargs) 2025-12-04T13:24:33.6998241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.6998283Z method(*args, **kwargs) 2025-12-04T13:24:33.6998432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.6998469Z with policy(): 2025-12-04T13:24:33.6998622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.6998663Z raise RuntimeError(msg) 2025-12-04T13:24:33.6999023Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17500733440. 2025-12-04T13:24:33.6999025Z 2025-12-04T13:24:33.6999116Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.6999371Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.6999390Z 2025-12-04T13:24:33.6999477Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.6999480Z 2025-12-04T13:24:33.6999538Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.6999582Z Traceback (most recent call last): 2025-12-04T13:24:33.6999783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.6999824Z getattr(self, test_name)() 2025-12-04T13:24:33.6999981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7000015Z fn() 2025-12-04T13:24:33.7000167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7000208Z method(*args, **kwargs) 2025-12-04T13:24:33.7000359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7000398Z method(*args, **kwargs) 2025-12-04T13:24:33.7000548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7000584Z with policy(): 2025-12-04T13:24:33.7000735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7000775Z raise RuntimeError(msg) 2025-12-04T13:24:33.7001138Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7001142Z 2025-12-04T13:24:33.7001215Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7001456Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7001458Z 2025-12-04T13:24:33.7001545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7001547Z 2025-12-04T13:24:33.7001549Z 2025-12-04T13:24:33.7001626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7001715Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7001954Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-613551715bb88517.xml - 2025-12-04T13:24:33.7002031Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7002294Z FAILED [22.8253s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7002341Z Traceback (most recent call last): 2025-12-04T13:24:33.7002504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7002547Z getattr(self, test_name)() 2025-12-04T13:24:33.7002707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7002741Z fn() 2025-12-04T13:24:33.7002891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7002945Z method(*args, **kwargs) 2025-12-04T13:24:33.7003097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7003165Z method(*args, **kwargs) 2025-12-04T13:24:33.7003316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7003354Z with policy(): 2025-12-04T13:24:33.7003504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7003546Z raise RuntimeError(msg) 2025-12-04T13:24:33.7003906Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17637048320. 2025-12-04T13:24:33.7003909Z 2025-12-04T13:24:33.7003984Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7004224Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7004227Z 2025-12-04T13:24:33.7004315Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7004317Z 2025-12-04T13:24:33.7004374Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7004420Z Traceback (most recent call last): 2025-12-04T13:24:33.7004582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7004625Z getattr(self, test_name)() 2025-12-04T13:24:33.7004784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7004818Z fn() 2025-12-04T13:24:33.7004971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7005013Z method(*args, **kwargs) 2025-12-04T13:24:33.7005166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7005205Z method(*args, **kwargs) 2025-12-04T13:24:33.7005356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7005392Z with policy(): 2025-12-04T13:24:33.7005545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7005585Z raise RuntimeError(msg) 2025-12-04T13:24:33.7005965Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17500733440. 2025-12-04T13:24:33.7005968Z 2025-12-04T13:24:33.7006041Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7006283Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7006285Z 2025-12-04T13:24:33.7006371Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7006375Z 2025-12-04T13:24:33.7006431Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7006477Z Traceback (most recent call last): 2025-12-04T13:24:33.7006638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7006680Z getattr(self, test_name)() 2025-12-04T13:24:33.7006871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7006918Z fn() 2025-12-04T13:24:33.7007081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7007122Z method(*args, **kwargs) 2025-12-04T13:24:33.7007273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7007313Z method(*args, **kwargs) 2025-12-04T13:24:33.7007462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7007500Z with policy(): 2025-12-04T13:24:33.7007650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7007692Z raise RuntimeError(msg) 2025-12-04T13:24:33.7008053Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7008057Z 2025-12-04T13:24:33.7008130Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7008368Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7008372Z 2025-12-04T13:24:33.7008458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7008523Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7008586Z ======================= 1 failed, 4 deselected in 22.96s ======================= 2025-12-04T13:24:33.7008625Z Got exit code 1 2025-12-04T13:24:33.7008666Z Retrying single test... 2025-12-04T13:24:33.7008860Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e3dd5e512506930b.xml 2025-12-04T13:24:33.7008920Z ============================= test session starts ============================== 2025-12-04T13:24:33.7009036Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7009077Z cachedir: .pytest_cache 2025-12-04T13:24:33.7009239Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7009285Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7009326Z configfile: pytest.ini 2025-12-04T13:24:33.7009493Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7009570Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7009872Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7009919Z Running 1 items in this shard 2025-12-04T13:24:33.7009921Z 2025-12-04T13:24:33.7010239Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 13:07:47.217000 441132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 441201 2025-12-04T13:24:33.7010396Z I1204 13:07:47.218000 441132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 441202 2025-12-04T13:24:33.7010550Z I1204 13:07:47.219000 441132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 441203 2025-12-04T13:24:33.7010724Z I1204 13:07:47.219000 441132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 441204 2025-12-04T13:24:33.7011322Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7011376Z _warn_cpu_init() 2025-12-04T13:24:33.7011683Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7011720Z _init_core_state( 2025-12-04T13:24:33.7012218Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7012284Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7012861Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7012900Z _warn_cpu_init() 2025-12-04T13:24:33.7013202Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7013241Z _init_core_state( 2025-12-04T13:24:33.7013733Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7013793Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7014376Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7014414Z _warn_cpu_init() 2025-12-04T13:24:33.7014715Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7014751Z _init_core_state( 2025-12-04T13:24:33.7015246Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7015315Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7015814Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7015885Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7016458Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7016498Z _warn_cpu_init() 2025-12-04T13:24:33.7016989Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7017047Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7017348Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7017384Z _init_core_state( 2025-12-04T13:24:33.7017877Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7017935Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7018422Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7018479Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7019828Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7019971Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7021268Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7021415Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7022696Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7022819Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7024108Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7024229Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7024459Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7024504Z return func(*args, **kwargs) 2025-12-04T13:24:33.7024728Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7024771Z return func(*args, **kwargs) 2025-12-04T13:24:33.7025006Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7025072Z return func(*args, **kwargs) 2025-12-04T13:24:33.7025295Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7025337Z return func(*args, **kwargs) 2025-12-04T13:24:33.7025556Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7025598Z return func(*args, **kwargs) 2025-12-04T13:24:33.7025818Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7025859Z return func(*args, **kwargs) 2025-12-04T13:24:33.7026082Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7026122Z return func(*args, **kwargs) 2025-12-04T13:24:33.7026343Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7026382Z return func(*args, **kwargs) 2025-12-04T13:24:33.7026676Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7026715Z return func(*args, **kwargs) 2025-12-04T13:24:33.7026864Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7027028Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7027325Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7027484Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7027773Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7027900Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7028197Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7028351Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7028628Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7028779Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7029066Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7029217Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7029511Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7029661Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7030194Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7030312Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7030512Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7030886Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7031001Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7031216Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7031383Z [rank2]:E1204 13:07:55.832000 441203 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7031425Z dist init r=2, world=4 2025-12-04T13:24:33.7031564Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7031725Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7032013Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7032171Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7032474Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7032601Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7032881Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7033029Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7033307Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7033469Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7033761Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7033913Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7034194Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7034343Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7034842Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17500733440. 2025-12-04T13:24:33.7034960Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7035157Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7035526Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7035642Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7035855Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7036023Z [rank1]:E1204 13:07:55.880000 441202 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7036062Z dist init r=1, world=4 2025-12-04T13:24:33.7036201Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7036360Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7036649Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7036817Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7037108Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7037234Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7037511Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7037673Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7037963Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7038126Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7038402Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7038540Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7038820Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7038971Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7039465Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17637048320. 2025-12-04T13:24:33.7039580Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7039819Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7040188Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7040304Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7040516Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7040681Z [rank0]:E1204 13:07:55.902000 441201 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7040720Z dist init r=0, world=4 2025-12-04T13:24:33.7040858Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7041040Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7041328Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7041485Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7041776Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7041902Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7042194Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7042374Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7042653Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7042800Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7043078Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7043215Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7043496Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7043646Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7044138Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17433624576. 2025-12-04T13:24:33.7044255Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7044452Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7044821Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7044934Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7045147Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7045324Z [rank3]:E1204 13:07:55.905000 441204 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7045364Z dist init r=3, world=4 2025-12-04T13:24:33.7045707Z [rank2]:[W1204 13:07:56.521959481 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7046039Z [rank1]:[W1204 13:07:56.676176991 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7046388Z [rank0]:[W1204 13:07:56.701862064 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7046730Z [rank3]:[W1204 13:07:56.745194219 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7046787Z FAILED [23.0255s] [100%] 2025-12-04T13:24:33.7046790Z 2025-12-04T13:24:33.7046848Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7046954Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.7047001Z Traceback (most recent call last): 2025-12-04T13:24:33.7047164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7047209Z self._join_processes(fn) 2025-12-04T13:24:33.7047384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7047442Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7047624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7047669Z raise RuntimeError(error) 2025-12-04T13:24:33.7047750Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7047795Z Traceback (most recent call last): 2025-12-04T13:24:33.7047956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7048000Z getattr(self, test_name)() 2025-12-04T13:24:33.7048157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7048192Z fn() 2025-12-04T13:24:33.7048345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7048389Z method(*args, **kwargs) 2025-12-04T13:24:33.7048542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7048584Z method(*args, **kwargs) 2025-12-04T13:24:33.7048735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7048773Z with policy(): 2025-12-04T13:24:33.7048926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7048966Z raise RuntimeError(msg) 2025-12-04T13:24:33.7049332Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7049335Z 2025-12-04T13:24:33.7049422Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7049666Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7049669Z 2025-12-04T13:24:33.7049792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7049794Z 2025-12-04T13:24:33.7049796Z 2025-12-04T13:24:33.7049874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7049962Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7050199Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e3dd5e512506930b.xml - 2025-12-04T13:24:33.7050277Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7050548Z FAILED [23.0255s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7050610Z Traceback (most recent call last): 2025-12-04T13:24:33.7050774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7050817Z getattr(self, test_name)() 2025-12-04T13:24:33.7050977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7051015Z fn() 2025-12-04T13:24:33.7051170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7051211Z method(*args, **kwargs) 2025-12-04T13:24:33.7051364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7051407Z method(*args, **kwargs) 2025-12-04T13:24:33.7051557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7051595Z with policy(): 2025-12-04T13:24:33.7051747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7051789Z raise RuntimeError(msg) 2025-12-04T13:24:33.7052150Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7052155Z 2025-12-04T13:24:33.7052230Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7052472Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7052475Z 2025-12-04T13:24:33.7052561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7052626Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7052689Z ====================== 1 failed, 20 deselected in 23.18s ======================= 2025-12-04T13:24:33.7052728Z Got exit code 1 2025-12-04T13:24:33.7052768Z Retrying single test... 2025-12-04T13:24:33.7052961Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-54f17f9277f8d2d8.xml 2025-12-04T13:24:33.7053018Z ============================= test session starts ============================== 2025-12-04T13:24:33.7053135Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7053193Z cachedir: .pytest_cache 2025-12-04T13:24:33.7053358Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7053405Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7053447Z configfile: pytest.ini 2025-12-04T13:24:33.7053609Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7053686Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7053920Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7053965Z Running 1 items in this shard 2025-12-04T13:24:33.7053968Z 2025-12-04T13:24:33.7054297Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 13:08:12.740000 442398 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 442467 2025-12-04T13:24:33.7054473Z I1204 13:08:12.741000 442398 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 442468 2025-12-04T13:24:33.7054626Z I1204 13:08:12.741000 442398 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 442469 2025-12-04T13:24:33.7054776Z I1204 13:08:12.742000 442398 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 442470 2025-12-04T13:24:33.7055361Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7055401Z _warn_cpu_init() 2025-12-04T13:24:33.7055711Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7055750Z _init_core_state( 2025-12-04T13:24:33.7056247Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7056311Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7056888Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7056926Z _warn_cpu_init() 2025-12-04T13:24:33.7057226Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7057264Z _init_core_state( 2025-12-04T13:24:33.7057770Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7057833Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7058409Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7058445Z _warn_cpu_init() 2025-12-04T13:24:33.7058756Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7058819Z _init_core_state( 2025-12-04T13:24:33.7059311Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7059371Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7059976Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7060016Z _warn_cpu_init() 2025-12-04T13:24:33.7060506Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7060565Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7061058Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7061117Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7061418Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7061454Z _init_core_state( 2025-12-04T13:24:33.7061943Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7062002Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7062503Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7062564Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7063863Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7064018Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7065291Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7065417Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7066696Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7066819Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7068102Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7068237Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7068477Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7068533Z return func(*args, **kwargs) 2025-12-04T13:24:33.7068757Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7068800Z return func(*args, **kwargs) 2025-12-04T13:24:33.7069021Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7069063Z return func(*args, **kwargs) 2025-12-04T13:24:33.7069284Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7069328Z return func(*args, **kwargs) 2025-12-04T13:24:33.7069549Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7069589Z return func(*args, **kwargs) 2025-12-04T13:24:33.7069842Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7069882Z return func(*args, **kwargs) 2025-12-04T13:24:33.7070103Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7070143Z return func(*args, **kwargs) 2025-12-04T13:24:33.7070366Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7070408Z return func(*args, **kwargs) 2025-12-04T13:24:33.7070707Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7070747Z return func(*args, **kwargs) 2025-12-04T13:24:33.7070894Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7071059Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7071354Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7071531Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7071818Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7071946Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7072224Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7072389Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7072668Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7072844Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7073125Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7073263Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7073544Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7073695Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7074191Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7074308Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7074506Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7074879Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7074997Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7075212Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7075379Z [rank2]:E1204 13:08:21.139000 442469 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7075420Z dist init r=2, world=4 2025-12-04T13:24:33.7075560Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7075732Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7076022Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7076178Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7076465Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7076590Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7076888Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7077062Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7077340Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7077489Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7077770Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7077910Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7078190Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7078341Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7078830Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17500733440. 2025-12-04T13:24:33.7078948Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7079146Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7079517Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7079631Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7079880Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7080049Z [rank1]:E1204 13:08:21.147000 442468 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7080103Z dist init r=1, world=4 2025-12-04T13:24:33.7080244Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7080405Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7080694Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7080847Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7081149Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7081288Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7081582Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7081731Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7082007Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7082157Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7082438Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7082577Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7082857Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7083005Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7083494Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17637048320. 2025-12-04T13:24:33.7083610Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7083808Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7084176Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7084289Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7084514Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7084680Z [rank0]:E1204 13:08:21.196000 442467 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7084720Z dist init r=0, world=4 2025-12-04T13:24:33.7084858Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7085018Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7085318Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7085484Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7085782Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7085906Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7086186Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7086333Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7086613Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7086761Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7087043Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7087179Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7087460Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7087611Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7088098Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17433624576. 2025-12-04T13:24:33.7088213Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7088409Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7088789Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7088904Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7089116Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7089283Z [rank3]:E1204 13:08:21.201000 442470 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7089322Z dist init r=3, world=4 2025-12-04T13:24:33.7089673Z [rank2]:[W1204 13:08:21.814239391 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7090098Z [rank1]:[W1204 13:08:21.828645213 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7090446Z [rank0]:[W1204 13:08:21.966236370 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7090776Z [rank3]:[W1204 13:08:21.979035928 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7090818Z FAILED [22.7268s] [100%] 2025-12-04T13:24:33.7090821Z 2025-12-04T13:24:33.7090880Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7090985Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.7091034Z Traceback (most recent call last): 2025-12-04T13:24:33.7091199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7091244Z self._join_processes(fn) 2025-12-04T13:24:33.7091418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7091474Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7091653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7091698Z raise RuntimeError(error) 2025-12-04T13:24:33.7091779Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7091829Z Traceback (most recent call last): 2025-12-04T13:24:33.7091990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7092034Z getattr(self, test_name)() 2025-12-04T13:24:33.7092194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7092230Z fn() 2025-12-04T13:24:33.7092381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7092424Z method(*args, **kwargs) 2025-12-04T13:24:33.7092576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7092615Z method(*args, **kwargs) 2025-12-04T13:24:33.7092792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7092830Z with policy(): 2025-12-04T13:24:33.7092983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7093024Z raise RuntimeError(msg) 2025-12-04T13:24:33.7093390Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7093392Z 2025-12-04T13:24:33.7093468Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7093726Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7093742Z 2025-12-04T13:24:33.7093830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7093845Z 2025-12-04T13:24:33.7093848Z 2025-12-04T13:24:33.7093924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7094013Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7094251Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-54f17f9277f8d2d8.xml - 2025-12-04T13:24:33.7094313Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7094573Z FAILED [22.7268s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7094621Z Traceback (most recent call last): 2025-12-04T13:24:33.7094789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7094835Z getattr(self, test_name)() 2025-12-04T13:24:33.7094995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7095031Z fn() 2025-12-04T13:24:33.7095183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7095224Z method(*args, **kwargs) 2025-12-04T13:24:33.7095375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7095417Z method(*args, **kwargs) 2025-12-04T13:24:33.7095567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7095606Z with policy(): 2025-12-04T13:24:33.7095760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7095804Z raise RuntimeError(msg) 2025-12-04T13:24:33.7096170Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17483956224. 2025-12-04T13:24:33.7096172Z 2025-12-04T13:24:33.7096246Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7096491Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7096493Z 2025-12-04T13:24:33.7096581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7096657Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7096721Z ====================== 1 failed, 20 deselected in 22.89s ======================= 2025-12-04T13:24:33.7096761Z Got exit code 1 2025-12-04T13:24:33.7096948Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7097078Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7097267Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-edb1798025bb2f53.xml 2025-12-04T13:24:33.7097325Z ============================= test session starts ============================== 2025-12-04T13:24:33.7097438Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7097492Z cachedir: .pytest_cache 2025-12-04T13:24:33.7097666Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7097725Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7097767Z configfile: pytest.ini 2025-12-04T13:24:33.7097929Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7098006Z collecting ... collected 60 items / 5 deselected / 55 selected 2025-12-04T13:24:33.7098059Z stepcurrent: skipping 5 already run items. 2025-12-04T13:24:33.7098104Z Running 16 items in this shard 2025-12-04T13:24:33.7098106Z 2025-12-04T13:24:33.7098422Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 13:08:37.883000 443664 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 443733 2025-12-04T13:24:33.7098580Z I1204 13:08:37.883000 443664 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 443734 2025-12-04T13:24:33.7098734Z I1204 13:08:37.884000 443664 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 443735 2025-12-04T13:24:33.7098887Z I1204 13:08:37.884000 443664 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 443736 2025-12-04T13:24:33.7099466Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7099506Z _warn_cpu_init() 2025-12-04T13:24:33.7099853Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7099893Z _init_core_state( 2025-12-04T13:24:33.7100388Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7100449Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7101039Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7101079Z _warn_cpu_init() 2025-12-04T13:24:33.7101384Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7101422Z _init_core_state( 2025-12-04T13:24:33.7101929Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7102005Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7102575Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7102628Z _warn_cpu_init() 2025-12-04T13:24:33.7102929Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7102967Z _init_core_state( 2025-12-04T13:24:33.7103462Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7103521Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7104097Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7104134Z _warn_cpu_init() 2025-12-04T13:24:33.7104627Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7104689Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7105178Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7105237Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7105739Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7105800Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7106103Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7106140Z _init_core_state( 2025-12-04T13:24:33.7106644Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7106730Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7107025Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7107067Z return func(*args, **kwargs) 2025-12-04T13:24:33.7107297Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7107339Z return func(*args, **kwargs) 2025-12-04T13:24:33.7107564Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7107608Z return func(*args, **kwargs) 2025-12-04T13:24:33.7107828Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7107870Z return func(*args, **kwargs) 2025-12-04T13:24:33.7108091Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7108133Z return func(*args, **kwargs) 2025-12-04T13:24:33.7108351Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7108392Z return func(*args, **kwargs) 2025-12-04T13:24:33.7108614Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7108658Z return func(*args, **kwargs) 2025-12-04T13:24:33.7108877Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7108918Z return func(*args, **kwargs) 2025-12-04T13:24:33.7109137Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7109179Z return func(*args, **kwargs) 2025-12-04T13:24:33.7109325Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7109491Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7109828Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7109986Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7110274Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7110401Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7110697Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7110862Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7111160Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7111310Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7111587Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7111726Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7112007Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7112158Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7112649Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224. 2025-12-04T13:24:33.7112766Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7112966Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7113339Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7113455Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7113667Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7113832Z [rank1]:E1204 13:08:46.577000 443734 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7113871Z dist init r=1, world=4 2025-12-04T13:24:33.7114024Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7114185Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7114473Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7114629Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7114914Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7115053Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7115355Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7115508Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7115787Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7115935Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7116215Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7116354Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7116633Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7116782Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7117272Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360. 2025-12-04T13:24:33.7117389Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7117586Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7117958Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7118072Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7118301Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7118467Z [rank3]:E1204 13:08:46.580000 443736 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7118507Z dist init r=3, world=4 2025-12-04T13:24:33.7118645Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7118806Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7119095Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7119258Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7119557Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7119727Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7120006Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7120158Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7120441Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7120591Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7120868Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7121006Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7121285Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7121437Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7121924Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008. 2025-12-04T13:24:33.7122040Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7122238Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7122622Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7122738Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7122951Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7123116Z [rank2]:E1204 13:08:46.583000 443735 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7123154Z dist init r=2, world=4 2025-12-04T13:24:33.7123293Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7123452Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7123754Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7123945Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7124230Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7124356Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7124635Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7124788Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7125067Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7125216Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7125494Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7125630Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7125911Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7126060Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7126548Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104. 2025-12-04T13:24:33.7126662Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7126872Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7127245Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7127360Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7127574Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7127738Z [rank0]:E1204 13:08:46.624000 443733 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7127779Z dist init r=0, world=4 2025-12-04T13:24:33.7128129Z [rank3]:[W1204 13:08:46.250329683 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7128487Z [rank1]:[W1204 13:08:46.255637326 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7128821Z [rank2]:[W1204 13:08:46.261224523 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7129153Z [rank0]:[W1204 13:08:46.358300853 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7129197Z FAILED [23.0241s] [ 6%] 2025-12-04T13:24:33.7129199Z 2025-12-04T13:24:33.7129255Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7129362Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7129408Z Traceback (most recent call last): 2025-12-04T13:24:33.7129575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7129618Z self._join_processes(fn) 2025-12-04T13:24:33.7129834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7129888Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7130071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7130116Z raise RuntimeError(error) 2025-12-04T13:24:33.7130197Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7130242Z Traceback (most recent call last): 2025-12-04T13:24:33.7130404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7130448Z getattr(self, test_name)() 2025-12-04T13:24:33.7130607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7130643Z fn() 2025-12-04T13:24:33.7130795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7130837Z method(*args, **kwargs) 2025-12-04T13:24:33.7131003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7131046Z method(*args, **kwargs) 2025-12-04T13:24:33.7131198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7131236Z with policy(): 2025-12-04T13:24:33.7131388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7131430Z raise RuntimeError(msg) 2025-12-04T13:24:33.7131791Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360. 2025-12-04T13:24:33.7131794Z 2025-12-04T13:24:33.7131886Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7132129Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7132160Z 2025-12-04T13:24:33.7132248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7132250Z 2025-12-04T13:24:33.7132252Z 2025-12-04T13:24:33.7132328Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7132416Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7132652Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-edb1798025bb2f53.xml - 2025-12-04T13:24:33.7132713Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7132970Z FAILED [23.0241s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7133017Z Traceback (most recent call last): 2025-12-04T13:24:33.7133183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7133225Z getattr(self, test_name)() 2025-12-04T13:24:33.7133385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7133420Z fn() 2025-12-04T13:24:33.7133576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7133616Z method(*args, **kwargs) 2025-12-04T13:24:33.7133769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7133810Z method(*args, **kwargs) 2025-12-04T13:24:33.7133963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7134003Z with policy(): 2025-12-04T13:24:33.7134156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7134198Z raise RuntimeError(msg) 2025-12-04T13:24:33.7134560Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360. 2025-12-04T13:24:33.7134562Z 2025-12-04T13:24:33.7134638Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7134878Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7134880Z 2025-12-04T13:24:33.7134979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7135043Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7135106Z ======================= 1 failed, 5 deselected in 23.18s ======================= 2025-12-04T13:24:33.7135143Z Got exit code 1 2025-12-04T13:24:33.7135185Z Retrying single test... 2025-12-04T13:24:33.7135375Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77e85d29215e870f.xml 2025-12-04T13:24:33.7135434Z ============================= test session starts ============================== 2025-12-04T13:24:33.7135547Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7135590Z cachedir: .pytest_cache 2025-12-04T13:24:33.7135763Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7135818Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7135874Z configfile: pytest.ini 2025-12-04T13:24:33.7136036Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7136113Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7136345Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7136390Z Running 1 items in this shard 2025-12-04T13:24:33.7136392Z 2025-12-04T13:24:33.7136711Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 13:09:03.253000 445074 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 445143 2025-12-04T13:24:33.7136870Z I1204 13:09:03.254000 445074 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 445144 2025-12-04T13:24:33.7137023Z I1204 13:09:03.254000 445074 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 445145 2025-12-04T13:24:33.7137175Z I1204 13:09:03.255000 445074 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 445146 2025-12-04T13:24:33.7137758Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7137796Z _warn_cpu_init() 2025-12-04T13:24:33.7138103Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7138141Z _init_core_state( 2025-12-04T13:24:33.7138640Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7138700Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7139288Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7139328Z _warn_cpu_init() 2025-12-04T13:24:33.7139950Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7139988Z _warn_cpu_init() 2025-12-04T13:24:33.7140315Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7140382Z _init_core_state( 2025-12-04T13:24:33.7140873Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7140934Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7141234Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7141271Z _init_core_state( 2025-12-04T13:24:33.7141765Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7141826Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7142398Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7142435Z _warn_cpu_init() 2025-12-04T13:24:33.7142925Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7142984Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7143469Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7143529Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7143845Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7143884Z _init_core_state( 2025-12-04T13:24:33.7144375Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7144433Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7144738Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7144793Z return func(*args, **kwargs) 2025-12-04T13:24:33.7145295Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7145353Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7145583Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7145628Z return func(*args, **kwargs) 2025-12-04T13:24:33.7145853Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7145898Z return func(*args, **kwargs) 2025-12-04T13:24:33.7146120Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7146164Z return func(*args, **kwargs) 2025-12-04T13:24:33.7146388Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7146430Z return func(*args, **kwargs) 2025-12-04T13:24:33.7146651Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7146693Z return func(*args, **kwargs) 2025-12-04T13:24:33.7146913Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7146957Z return func(*args, **kwargs) 2025-12-04T13:24:33.7147177Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7147219Z return func(*args, **kwargs) 2025-12-04T13:24:33.7147440Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7147480Z return func(*args, **kwargs) 2025-12-04T13:24:33.7147627Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7147791Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7148094Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7148251Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7148539Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7148666Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7148958Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7149129Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7149407Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7149556Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7149876Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7150016Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7150298Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7150450Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7150943Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008. 2025-12-04T13:24:33.7151061Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7151262Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7151632Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7151747Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7151961Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7152127Z [rank2]:E1204 13:09:11.858000 445145 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7152169Z dist init r=2, world=4 2025-12-04T13:24:33.7152322Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7152485Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7152772Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7152928Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7153226Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7153369Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7153662Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7153812Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7154091Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7154238Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7154519Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7154656Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7154937Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7155084Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7155572Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224. 2025-12-04T13:24:33.7155692Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7155890Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7156257Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7156369Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7156601Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7156767Z [rank1]:E1204 13:09:11.863000 445144 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7156809Z dist init r=1, world=4 2025-12-04T13:24:33.7156946Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7157108Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7157397Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7157562Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7157873Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7157998Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7158279Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7158427Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7158707Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7158859Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7159136Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7159272Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7159551Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7159739Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7160226Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104. 2025-12-04T13:24:33.7160342Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7160541Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7160922Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7161038Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7161249Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7161414Z [rank0]:E1204 13:09:11.875000 445143 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7161452Z dist init r=0, world=4 2025-12-04T13:24:33.7161592Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7161765Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7162068Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7162236Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7162523Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7162648Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7162928Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7163081Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7163357Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7163505Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7163782Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7163919Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7164201Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7164349Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7164837Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360. 2025-12-04T13:24:33.7164952Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7165162Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7165533Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7165646Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7165859Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7166023Z [rank3]:E1204 13:09:11.886000 445146 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7166072Z dist init r=3, world=4 2025-12-04T13:24:33.7166426Z [rank2]:[W1204 13:09:12.539074001 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7166775Z [rank1]:[W1204 13:09:12.555837552 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7167106Z [rank0]:[W1204 13:09:12.664400899 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7167439Z [rank3]:[W1204 13:09:12.665067005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7167482Z FAILED [22.9271s] [100%] 2025-12-04T13:24:33.7167484Z 2025-12-04T13:24:33.7167541Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7167647Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7167693Z Traceback (most recent call last): 2025-12-04T13:24:33.7167859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7167903Z self._join_processes(fn) 2025-12-04T13:24:33.7168079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7168133Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7168316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7168362Z raise RuntimeError(error) 2025-12-04T13:24:33.7168443Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7168491Z Traceback (most recent call last): 2025-12-04T13:24:33.7168653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7168697Z getattr(self, test_name)() 2025-12-04T13:24:33.7168856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7168892Z fn() 2025-12-04T13:24:33.7169044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7169087Z method(*args, **kwargs) 2025-12-04T13:24:33.7169251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7169294Z method(*args, **kwargs) 2025-12-04T13:24:33.7169445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7169483Z with policy(): 2025-12-04T13:24:33.7169634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7169675Z raise RuntimeError(msg) 2025-12-04T13:24:33.7170092Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224. 2025-12-04T13:24:33.7170095Z 2025-12-04T13:24:33.7170188Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7170445Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7170468Z 2025-12-04T13:24:33.7170555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7170557Z 2025-12-04T13:24:33.7170559Z 2025-12-04T13:24:33.7170636Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7170723Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7170959Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77e85d29215e870f.xml - 2025-12-04T13:24:33.7171019Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7171277Z FAILED [22.9271s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7171324Z Traceback (most recent call last): 2025-12-04T13:24:33.7171491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7171533Z getattr(self, test_name)() 2025-12-04T13:24:33.7171694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7171729Z fn() 2025-12-04T13:24:33.7171882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7171925Z method(*args, **kwargs) 2025-12-04T13:24:33.7172077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7172120Z method(*args, **kwargs) 2025-12-04T13:24:33.7172271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7172310Z with policy(): 2025-12-04T13:24:33.7172461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7172502Z raise RuntimeError(msg) 2025-12-04T13:24:33.7172863Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224. 2025-12-04T13:24:33.7172865Z 2025-12-04T13:24:33.7172941Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7173196Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7173200Z 2025-12-04T13:24:33.7173288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7173351Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7173415Z ====================== 1 failed, 20 deselected in 23.09s ======================= 2025-12-04T13:24:33.7173452Z Got exit code 1 2025-12-04T13:24:33.7173493Z Retrying single test... 2025-12-04T13:24:33.7173682Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4b634b6216db5416.xml 2025-12-04T13:24:33.7173741Z ============================= test session starts ============================== 2025-12-04T13:24:33.7173859Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7173914Z cachedir: .pytest_cache 2025-12-04T13:24:33.7174093Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7174152Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7174196Z configfile: pytest.ini 2025-12-04T13:24:33.7174361Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7174440Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7174673Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7174722Z Running 1 items in this shard 2025-12-04T13:24:33.7174724Z 2025-12-04T13:24:33.7175041Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 13:09:28.594000 446484 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 446553 2025-12-04T13:24:33.7175205Z I1204 13:09:28.594000 446484 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 446554 2025-12-04T13:24:33.7175359Z I1204 13:09:28.595000 446484 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 446555 2025-12-04T13:24:33.7175513Z I1204 13:09:28.596000 446484 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 446556 2025-12-04T13:24:33.7176099Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7176140Z _warn_cpu_init() 2025-12-04T13:24:33.7176448Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7176486Z _init_core_state( 2025-12-04T13:24:33.7176988Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7177053Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7177639Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7177682Z _warn_cpu_init() 2025-12-04T13:24:33.7177983Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7178021Z _init_core_state( 2025-12-04T13:24:33.7178524Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7178610Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7179185Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7179224Z _warn_cpu_init() 2025-12-04T13:24:33.7179530Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7179569Z _init_core_state( 2025-12-04T13:24:33.7180126Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7180187Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7180762Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7180807Z _warn_cpu_init() 2025-12-04T13:24:33.7181303Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7181364Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7181664Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7181705Z _init_core_state( 2025-12-04T13:24:33.7182218Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7182279Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7182576Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7182621Z return func(*args, **kwargs) 2025-12-04T13:24:33.7183127Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7183214Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7183704Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7183767Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7183998Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7184047Z return func(*args, **kwargs) 2025-12-04T13:24:33.7184275Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7184324Z return func(*args, **kwargs) 2025-12-04T13:24:33.7184551Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7184597Z return func(*args, **kwargs) 2025-12-04T13:24:33.7184820Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7184866Z return func(*args, **kwargs) 2025-12-04T13:24:33.7185088Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7185134Z return func(*args, **kwargs) 2025-12-04T13:24:33.7185357Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7185404Z return func(*args, **kwargs) 2025-12-04T13:24:33.7185627Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7185669Z return func(*args, **kwargs) 2025-12-04T13:24:33.7185892Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7185934Z return func(*args, **kwargs) 2025-12-04T13:24:33.7186086Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7186263Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7186561Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7186720Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7187012Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7187141Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7187440Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7187625Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7187904Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7188056Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7188334Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7188476Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7188758Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7188911Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7189412Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360. 2025-12-04T13:24:33.7189531Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7189776Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7190147Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7190264Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7190477Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7190660Z [rank3]:E1204 13:09:37.170000 446556 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7190704Z dist init r=3, world=4 2025-12-04T13:24:33.7190844Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7191007Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7191294Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7191454Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7191760Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7191914Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7192192Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7192344Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7192640Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7192789Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7193070Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7193207Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7193486Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7193636Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7194133Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224. 2025-12-04T13:24:33.7194251Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7194448Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7194817Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7194931Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7195159Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7195326Z [rank1]:E1204 13:09:37.173000 446554 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7195365Z dist init r=1, world=4 2025-12-04T13:24:33.7195505Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7195665Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7195968Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7196135Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7196439Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7196563Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7196843Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7196993Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7197274Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7197424Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7197701Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7197838Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7198117Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7198272Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7198764Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104. 2025-12-04T13:24:33.7198880Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7199079Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7199466Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7199584Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7199835Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7200003Z [rank0]:E1204 13:09:37.175000 446553 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7200042Z dist init r=0, world=4 2025-12-04T13:24:33.7200181Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7200361Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7200664Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7200837Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7201129Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7201257Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7201539Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7201691Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7201971Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7202120Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7202402Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7202542Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7202823Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7202973Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7203466Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008. 2025-12-04T13:24:33.7203584Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7203796Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7204164Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7204276Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7204489Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7204670Z [rank2]:E1204 13:09:37.230000 446555 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7204723Z dist init r=2, world=4 2025-12-04T13:24:33.7205061Z [rank3]:[W1204 13:09:37.837976598 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7205411Z [rank1]:[W1204 13:09:37.847011649 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7205744Z [rank0]:[W1204 13:09:37.859592072 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7206076Z [rank2]:[W1204 13:09:37.990089786 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7206121Z FAILED [22.8271s] [100%] 2025-12-04T13:24:33.7206123Z 2025-12-04T13:24:33.7206181Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7206287Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7206336Z Traceback (most recent call last): 2025-12-04T13:24:33.7206502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7206549Z self._join_processes(fn) 2025-12-04T13:24:33.7206724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7206781Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7206962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7207008Z raise RuntimeError(error) 2025-12-04T13:24:33.7207089Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7207136Z Traceback (most recent call last): 2025-12-04T13:24:33.7207297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7207342Z getattr(self, test_name)() 2025-12-04T13:24:33.7207500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7207537Z fn() 2025-12-04T13:24:33.7207689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7207733Z method(*args, **kwargs) 2025-12-04T13:24:33.7207895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7207940Z method(*args, **kwargs) 2025-12-04T13:24:33.7208093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7208133Z with policy(): 2025-12-04T13:24:33.7208285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7208329Z raise RuntimeError(msg) 2025-12-04T13:24:33.7208705Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104. 2025-12-04T13:24:33.7208720Z 2025-12-04T13:24:33.7208798Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7209055Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7209057Z 2025-12-04T13:24:33.7209145Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7209147Z 2025-12-04T13:24:33.7209209Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7209256Z Traceback (most recent call last): 2025-12-04T13:24:33.7209422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7209464Z getattr(self, test_name)() 2025-12-04T13:24:33.7209625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7209660Z fn() 2025-12-04T13:24:33.7209860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7209901Z method(*args, **kwargs) 2025-12-04T13:24:33.7210054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7210095Z method(*args, **kwargs) 2025-12-04T13:24:33.7210247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7210284Z with policy(): 2025-12-04T13:24:33.7210438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7210479Z raise RuntimeError(msg) 2025-12-04T13:24:33.7210844Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360. 2025-12-04T13:24:33.7210848Z 2025-12-04T13:24:33.7210924Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7211164Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7211166Z 2025-12-04T13:24:33.7211255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7211257Z 2025-12-04T13:24:33.7211258Z 2025-12-04T13:24:33.7211333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7211423Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7211678Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4b634b6216db5416.xml - 2025-12-04T13:24:33.7211743Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7212000Z FAILED [22.8271s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7212047Z Traceback (most recent call last): 2025-12-04T13:24:33.7212212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7212255Z getattr(self, test_name)() 2025-12-04T13:24:33.7212418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7212455Z fn() 2025-12-04T13:24:33.7212630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7212686Z method(*args, **kwargs) 2025-12-04T13:24:33.7212840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7212895Z method(*args, **kwargs) 2025-12-04T13:24:33.7213047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7213084Z with policy(): 2025-12-04T13:24:33.7213238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7213279Z raise RuntimeError(msg) 2025-12-04T13:24:33.7213644Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104. 2025-12-04T13:24:33.7213647Z 2025-12-04T13:24:33.7213722Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7213963Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7213965Z 2025-12-04T13:24:33.7214053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7214055Z 2025-12-04T13:24:33.7214115Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7214163Z Traceback (most recent call last): 2025-12-04T13:24:33.7214326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7214370Z getattr(self, test_name)() 2025-12-04T13:24:33.7214530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7214568Z fn() 2025-12-04T13:24:33.7214719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7214761Z method(*args, **kwargs) 2025-12-04T13:24:33.7214910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7214952Z method(*args, **kwargs) 2025-12-04T13:24:33.7215103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7215142Z with policy(): 2025-12-04T13:24:33.7215293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7215335Z raise RuntimeError(msg) 2025-12-04T13:24:33.7215707Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360. 2025-12-04T13:24:33.7215714Z 2025-12-04T13:24:33.7215788Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7216027Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7216030Z 2025-12-04T13:24:33.7216116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7216181Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7216245Z ====================== 1 failed, 20 deselected in 22.99s ======================= 2025-12-04T13:24:33.7216286Z Got exit code 1 2025-12-04T13:24:33.7216484Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7216629Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7216833Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-81b2950bc580a47e.xml 2025-12-04T13:24:33.7216894Z ============================= test session starts ============================== 2025-12-04T13:24:33.7217007Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7217052Z cachedir: .pytest_cache 2025-12-04T13:24:33.7217212Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7217261Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7217303Z configfile: pytest.ini 2025-12-04T13:24:33.7217470Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7217546Z collecting ... collected 60 items / 6 deselected / 54 selected 2025-12-04T13:24:33.7217601Z stepcurrent: skipping 6 already run items. 2025-12-04T13:24:33.7217646Z Running 15 items in this shard 2025-12-04T13:24:33.7217648Z 2025-12-04T13:24:33.7217987Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 13:09:53.844000 447894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 447963 2025-12-04T13:24:33.7220451Z I1204 13:09:53.844000 447894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 447964 2025-12-04T13:24:33.7220614Z I1204 13:09:53.845000 447894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 447965 2025-12-04T13:24:33.7220768Z I1204 13:09:53.846000 447894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 447966 2025-12-04T13:24:33.7221351Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7221394Z _warn_cpu_init() 2025-12-04T13:24:33.7221693Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7221732Z _init_core_state( 2025-12-04T13:24:33.7222270Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7222336Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7222906Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7222946Z _warn_cpu_init() 2025-12-04T13:24:33.7223258Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7223332Z _init_core_state( 2025-12-04T13:24:33.7223821Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7223882Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7224452Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7224491Z _warn_cpu_init() 2025-12-04T13:24:33.7224788Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7224825Z _init_core_state( 2025-12-04T13:24:33.7225316Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7225376Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7225944Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7225983Z _warn_cpu_init() 2025-12-04T13:24:33.7226471Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7226541Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7227034Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7227091Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7227386Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7227423Z _init_core_state( 2025-12-04T13:24:33.7227925Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7228006Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7228493Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7228554Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7229877Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7230007Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7231278Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7231419Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7232703Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7232848Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7234108Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7234231Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7234461Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7234506Z return func(*args, **kwargs) 2025-12-04T13:24:33.7234732Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7234778Z return func(*args, **kwargs) 2025-12-04T13:24:33.7235000Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7235044Z return func(*args, **kwargs) 2025-12-04T13:24:33.7235267Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7235306Z return func(*args, **kwargs) 2025-12-04T13:24:33.7235527Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7235566Z return func(*args, **kwargs) 2025-12-04T13:24:33.7235786Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7235825Z return func(*args, **kwargs) 2025-12-04T13:24:33.7236221Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7236264Z return func(*args, **kwargs) 2025-12-04T13:24:33.7236485Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7236526Z return func(*args, **kwargs) 2025-12-04T13:24:33.7236819Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7236859Z return func(*args, **kwargs) 2025-12-04T13:24:33.7237021Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7237198Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7237512Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7237670Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7237957Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7238085Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7238370Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7238520Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7238798Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7238946Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7239229Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7239370Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7239652Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7239926Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7240442Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7240573Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7240771Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7241161Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7241275Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7241500Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7241680Z [rank2]:E1204 13:10:26.098000 447965 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7241736Z dist init r=2, world=4 2025-12-04T13:24:33.7241874Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7242036Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7242327Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7242480Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7242769Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7242895Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7243173Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7243321Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7243600Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7243747Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7244025Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7244161Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7244442Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7244591Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7245110Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7245227Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7245424Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7245820Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7245944Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7246169Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7246333Z [rank3]:E1204 13:10:26.107000 447966 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7246372Z dist init r=3, world=4 2025-12-04T13:24:33.7246511Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7246668Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7246961Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7247117Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7247400Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7247525Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7247802Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7247952Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7248229Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7248376Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7248652Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7248787Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7249079Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7249230Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7249782Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7249896Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7250110Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7250510Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7250637Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7250847Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7251010Z [rank0]:E1204 13:10:26.112000 447963 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7251050Z dist init r=0, world=4 2025-12-04T13:24:33.7251188Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7251349Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7251637Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7251792Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7252077Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7252201Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7252481Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7252628Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7252904Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7253049Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7253340Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7253477Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7253756Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7253904Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7254423Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7254554Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7254760Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7255145Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7255258Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7255469Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7255634Z [rank1]:E1204 13:10:26.128000 447964 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7255673Z dist init r=1, world=4 2025-12-04T13:24:33.7256011Z [rank2]:[W1204 13:10:26.792493952 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7256343Z [rank3]:[W1204 13:10:26.795684377 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7256673Z [rank0]:[W1204 13:10:26.827278473 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7257002Z [rank1]:[W1204 13:10:26.884352038 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7257042Z FAILED [46.5431s] [ 6%] 2025-12-04T13:24:33.7257045Z 2025-12-04T13:24:33.7257103Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7257227Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _ 2025-12-04T13:24:33.7257274Z Traceback (most recent call last): 2025-12-04T13:24:33.7257438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7257483Z self._join_processes(fn) 2025-12-04T13:24:33.7257667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7257725Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7257902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7257947Z raise RuntimeError(error) 2025-12-04T13:24:33.7258026Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7258073Z Traceback (most recent call last): 2025-12-04T13:24:33.7258234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7258275Z getattr(self, test_name)() 2025-12-04T13:24:33.7258444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7258490Z fn() 2025-12-04T13:24:33.7258644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7258696Z method(*args, **kwargs) 2025-12-04T13:24:33.7258847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7258887Z method(*args, **kwargs) 2025-12-04T13:24:33.7259038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7259075Z with policy(): 2025-12-04T13:24:33.7259228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7259268Z raise RuntimeError(msg) 2025-12-04T13:24:33.7259652Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7259656Z 2025-12-04T13:24:33.7259782Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7260043Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7260045Z 2025-12-04T13:24:33.7260134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7260136Z 2025-12-04T13:24:33.7260138Z 2025-12-04T13:24:33.7260214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7260302Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7260536Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-81b2950bc580a47e.xml - 2025-12-04T13:24:33.7260599Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7260876Z FAILED [46.5431s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7260923Z Traceback (most recent call last): 2025-12-04T13:24:33.7261087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7261129Z getattr(self, test_name)() 2025-12-04T13:24:33.7261288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7261324Z fn() 2025-12-04T13:24:33.7261489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7261532Z method(*args, **kwargs) 2025-12-04T13:24:33.7261681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7261722Z method(*args, **kwargs) 2025-12-04T13:24:33.7261872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7261908Z with policy(): 2025-12-04T13:24:33.7262059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7262099Z raise RuntimeError(msg) 2025-12-04T13:24:33.7262495Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7262528Z 2025-12-04T13:24:33.7262601Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7262857Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7262859Z 2025-12-04T13:24:33.7262945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7263010Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7263072Z ======================= 1 failed, 6 deselected in 46.70s ======================= 2025-12-04T13:24:33.7263110Z Got exit code 1 2025-12-04T13:24:33.7263149Z Retrying single test... 2025-12-04T13:24:33.7263340Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3a5e7ff66ad2299b.xml 2025-12-04T13:24:33.7263400Z ============================= test session starts ============================== 2025-12-04T13:24:33.7263514Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7263555Z cachedir: .pytest_cache 2025-12-04T13:24:33.7263714Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7263762Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7263802Z configfile: pytest.ini 2025-12-04T13:24:33.7263964Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7264038Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7264293Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7264338Z Running 1 items in this shard 2025-12-04T13:24:33.7264341Z 2025-12-04T13:24:33.7264673Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 13:10:42.778000 449160 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 449229 2025-12-04T13:24:33.7264827Z I1204 13:10:42.780000 449160 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 449230 2025-12-04T13:24:33.7264979Z I1204 13:10:42.780000 449160 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 449231 2025-12-04T13:24:33.7265128Z I1204 13:10:42.781000 449160 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 449232 2025-12-04T13:24:33.7265725Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7265765Z _warn_cpu_init() 2025-12-04T13:24:33.7266330Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7266379Z _warn_cpu_init() 2025-12-04T13:24:33.7266689Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7266739Z _init_core_state( 2025-12-04T13:24:33.7267233Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7267296Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7267590Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7267628Z _init_core_state( 2025-12-04T13:24:33.7268118Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7268179Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7268747Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7268785Z _warn_cpu_init() 2025-12-04T13:24:33.7269077Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7269114Z _init_core_state( 2025-12-04T13:24:33.7269601Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7269660Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7270280Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7270318Z _warn_cpu_init() 2025-12-04T13:24:33.7270809Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7270867Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7271184Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7271249Z _init_core_state( 2025-12-04T13:24:33.7271731Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7271789Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7272271Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7272330Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7272813Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7272870Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7274142Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7274268Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7275541Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7275678Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7276952Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7277086Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7278350Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7278474Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7278702Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7278748Z return func(*args, **kwargs) 2025-12-04T13:24:33.7278974Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7279016Z return func(*args, **kwargs) 2025-12-04T13:24:33.7279239Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7279292Z return func(*args, **kwargs) 2025-12-04T13:24:33.7279517Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7279560Z return func(*args, **kwargs) 2025-12-04T13:24:33.7279820Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7279861Z return func(*args, **kwargs) 2025-12-04T13:24:33.7280081Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7280121Z return func(*args, **kwargs) 2025-12-04T13:24:33.7280362Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7280417Z return func(*args, **kwargs) 2025-12-04T13:24:33.7280653Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7280695Z return func(*args, **kwargs) 2025-12-04T13:24:33.7280986Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7281029Z return func(*args, **kwargs) 2025-12-04T13:24:33.7281174Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7281342Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7281636Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7281796Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7282085Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7282212Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7282496Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7282647Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7282928Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7283076Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7283357Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7283495Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7283791Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7283940Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7284451Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7284583Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7284790Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7285195Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7285311Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7285524Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7285691Z [rank2]:E1204 13:11:15.065000 449231 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7285732Z dist init r=2, world=4 2025-12-04T13:24:33.7285875Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7286035Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7286325Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7286479Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7286770Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7286896Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7287177Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7287327Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7287604Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7287753Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7288048Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7288188Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7288465Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7288615Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7289133Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7289269Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7289467Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7289903Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7290020Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7290233Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7290400Z [rank1]:E1204 13:11:15.068000 449230 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7290540Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7290700Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7290988Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7291143Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7291431Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7291555Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7291834Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7291984Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7292280Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7292430Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7292706Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7292843Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7293122Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7293287Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7293816Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7293929Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7294126Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7294514Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7294630Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7294841Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7295005Z [rank3]:E1204 13:11:15.069000 449232 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7295045Z dist init r=1, world=4 2025-12-04T13:24:33.7295083Z dist init r=3, world=4 2025-12-04T13:24:33.7295221Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7295381Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7295668Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7295821Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7296105Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7296229Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7296518Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7296667Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7296946Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7297093Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7297368Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7297516Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7297804Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7297964Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7298469Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7298583Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7298779Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7299169Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7299284Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7299493Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7299659Z [rank0]:E1204 13:11:15.095000 449229 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7299742Z dist init r=0, world=4 2025-12-04T13:24:33.7300079Z [rank1]:[W1204 13:11:15.793670503 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7300409Z [rank3]:[W1204 13:11:15.835842415 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7300735Z [rank2]:[W1204 13:11:15.835878704 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7301077Z [rank0]:[W1204 13:11:15.852113141 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7301119Z FAILED [46.6456s] [100%] 2025-12-04T13:24:33.7301121Z 2025-12-04T13:24:33.7301178Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7301304Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _ 2025-12-04T13:24:33.7301350Z Traceback (most recent call last): 2025-12-04T13:24:33.7301515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7301559Z self._join_processes(fn) 2025-12-04T13:24:33.7301746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7301813Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7302011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7302054Z raise RuntimeError(error) 2025-12-04T13:24:33.7302135Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7302181Z Traceback (most recent call last): 2025-12-04T13:24:33.7302342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7302384Z getattr(self, test_name)() 2025-12-04T13:24:33.7302542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7302577Z fn() 2025-12-04T13:24:33.7302730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7302773Z method(*args, **kwargs) 2025-12-04T13:24:33.7302925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7302966Z method(*args, **kwargs) 2025-12-04T13:24:33.7303117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7303155Z with policy(): 2025-12-04T13:24:33.7303307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7303348Z raise RuntimeError(msg) 2025-12-04T13:24:33.7303731Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7303735Z 2025-12-04T13:24:33.7303812Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7304070Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7304073Z 2025-12-04T13:24:33.7304161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7304163Z 2025-12-04T13:24:33.7304222Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7304267Z Traceback (most recent call last): 2025-12-04T13:24:33.7304428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7304470Z getattr(self, test_name)() 2025-12-04T13:24:33.7304628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7304678Z fn() 2025-12-04T13:24:33.7304831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7304873Z method(*args, **kwargs) 2025-12-04T13:24:33.7305022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7305063Z method(*args, **kwargs) 2025-12-04T13:24:33.7305213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7305251Z with policy(): 2025-12-04T13:24:33.7305403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7305444Z raise RuntimeError(msg) 2025-12-04T13:24:33.7305834Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7305856Z 2025-12-04T13:24:33.7305930Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7306189Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7306191Z 2025-12-04T13:24:33.7306278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7306280Z 2025-12-04T13:24:33.7306282Z 2025-12-04T13:24:33.7306359Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7306447Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7306683Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3a5e7ff66ad2299b.xml - 2025-12-04T13:24:33.7306746Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7307018Z FAILED [46.6456s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7307064Z Traceback (most recent call last): 2025-12-04T13:24:33.7307228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7307272Z getattr(self, test_name)() 2025-12-04T13:24:33.7307430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7307467Z fn() 2025-12-04T13:24:33.7307619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7307661Z method(*args, **kwargs) 2025-12-04T13:24:33.7307811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7307851Z method(*args, **kwargs) 2025-12-04T13:24:33.7308001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7308038Z with policy(): 2025-12-04T13:24:33.7308189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7308231Z raise RuntimeError(msg) 2025-12-04T13:24:33.7308623Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7308628Z 2025-12-04T13:24:33.7308701Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7308957Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7308960Z 2025-12-04T13:24:33.7309046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7309048Z 2025-12-04T13:24:33.7309107Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7309151Z Traceback (most recent call last): 2025-12-04T13:24:33.7309313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7309364Z getattr(self, test_name)() 2025-12-04T13:24:33.7309536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7309584Z fn() 2025-12-04T13:24:33.7309778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7309818Z method(*args, **kwargs) 2025-12-04T13:24:33.7309969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7310008Z method(*args, **kwargs) 2025-12-04T13:24:33.7310160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7310196Z with policy(): 2025-12-04T13:24:33.7310348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7310389Z raise RuntimeError(msg) 2025-12-04T13:24:33.7310772Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7310775Z 2025-12-04T13:24:33.7310848Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7311102Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7311104Z 2025-12-04T13:24:33.7311191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7311253Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7311317Z ====================== 1 failed, 20 deselected in 46.81s ======================= 2025-12-04T13:24:33.7311356Z Got exit code 1 2025-12-04T13:24:33.7311398Z Retrying single test... 2025-12-04T13:24:33.7311589Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b11b91cb26e9a4c3.xml 2025-12-04T13:24:33.7311647Z ============================= test session starts ============================== 2025-12-04T13:24:33.7311758Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7311800Z cachedir: .pytest_cache 2025-12-04T13:24:33.7311959Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7312006Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7312046Z configfile: pytest.ini 2025-12-04T13:24:33.7312209Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7312302Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7312556Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7312600Z Running 1 items in this shard 2025-12-04T13:24:33.7312602Z 2025-12-04T13:24:33.7312936Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 13:11:31.717000 450426 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 450495 2025-12-04T13:24:33.7313092Z I1204 13:11:31.717000 450426 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 450496 2025-12-04T13:24:33.7313257Z I1204 13:11:31.718000 450426 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 450497 2025-12-04T13:24:33.7313424Z I1204 13:11:31.719000 450426 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 450498 2025-12-04T13:24:33.7314017Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7314055Z _warn_cpu_init() 2025-12-04T13:24:33.7314353Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7314391Z _init_core_state( 2025-12-04T13:24:33.7314889Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7314953Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7315530Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7315568Z _warn_cpu_init() 2025-12-04T13:24:33.7315867Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7315905Z _init_core_state( 2025-12-04T13:24:33.7316395Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7316457Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7317040Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7317079Z _warn_cpu_init() 2025-12-04T13:24:33.7317371Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7317407Z _init_core_state( 2025-12-04T13:24:33.7317907Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7317983Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7318563Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7318601Z _warn_cpu_init() 2025-12-04T13:24:33.7319090Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7319150Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7319634Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7319729Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7320214Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7320274Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7320568Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T13:24:33.7320604Z _init_core_state( 2025-12-04T13:24:33.7321089Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7321146Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7322447Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7322588Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7323875Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7323998Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7325263Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7325388Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7326657Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7326778Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7327007Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7327050Z return func(*args, **kwargs) 2025-12-04T13:24:33.7327291Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7327345Z return func(*args, **kwargs) 2025-12-04T13:24:33.7327569Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7327623Z return func(*args, **kwargs) 2025-12-04T13:24:33.7327842Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7327884Z return func(*args, **kwargs) 2025-12-04T13:24:33.7328104Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7328145Z return func(*args, **kwargs) 2025-12-04T13:24:33.7328363Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7328406Z return func(*args, **kwargs) 2025-12-04T13:24:33.7328625Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7328666Z return func(*args, **kwargs) 2025-12-04T13:24:33.7328884Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7328926Z return func(*args, **kwargs) 2025-12-04T13:24:33.7329214Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7329255Z return func(*args, **kwargs) 2025-12-04T13:24:33.7329401Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7329566Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7329904Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7330061Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7330348Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7330473Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7330768Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7330917Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7331194Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7331341Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7331632Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7331797Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7332075Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7332225Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7332736Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7332854Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7333051Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7333435Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7333551Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7333763Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7333932Z [rank2]:E1204 13:12:04.029000 450497 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7333970Z dist init r=2, world=4 2025-12-04T13:24:33.7334109Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7334270Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7334559Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7334715Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7335011Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7335138Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7335414Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7335561Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7335857Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7336013Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7336303Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7336438Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7336718Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7336867Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7337375Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7337492Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7337687Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7338075Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7338189Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7338400Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7338564Z [rank0]:E1204 13:12:04.043000 450495 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7338604Z dist init r=0, world=4 2025-12-04T13:24:33.7338741Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7338903Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7339206Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7339361Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7339645Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7339817Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7340108Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7340270Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7340560Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7340707Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7340982Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7341119Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7341399Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7341554Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7342061Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7342176Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7342374Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7342758Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7342871Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7343080Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7343244Z [rank1]:E1204 13:12:04.112000 450496 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7343282Z dist init r=1, world=4 2025-12-04T13:24:33.7343532Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7343692Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7343986Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7344141Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7344438Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7344563Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7344862Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7345010Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7345284Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7345431Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7345709Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7345845Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7346124Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7346272Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7346779Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7346895Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7347091Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7347475Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7347587Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7347809Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7347973Z [rank3]:E1204 13:12:04.120000 450498 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7348012Z dist init r=3, world=4 2025-12-04T13:24:33.7348348Z [rank2]:[W1204 13:12:04.718831161 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7348682Z [rank0]:[W1204 13:12:04.726609495 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7349020Z [rank1]:[W1204 13:12:04.889636338 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7349374Z [rank3]:[W1204 13:12:04.895585428 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7349417Z FAILED [46.6473s] [100%] 2025-12-04T13:24:33.7349419Z 2025-12-04T13:24:33.7349475Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7349598Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _ 2025-12-04T13:24:33.7349644Z Traceback (most recent call last): 2025-12-04T13:24:33.7349849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7349894Z self._join_processes(fn) 2025-12-04T13:24:33.7350070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7350125Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7350304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7350348Z raise RuntimeError(error) 2025-12-04T13:24:33.7350427Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7350473Z Traceback (most recent call last): 2025-12-04T13:24:33.7350633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7350676Z getattr(self, test_name)() 2025-12-04T13:24:33.7350836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7350874Z fn() 2025-12-04T13:24:33.7351026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7351068Z method(*args, **kwargs) 2025-12-04T13:24:33.7351217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7351258Z method(*args, **kwargs) 2025-12-04T13:24:33.7351408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7351445Z with policy(): 2025-12-04T13:24:33.7351596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7351638Z raise RuntimeError(msg) 2025-12-04T13:24:33.7352049Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7352055Z 2025-12-04T13:24:33.7352129Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7352388Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7352390Z 2025-12-04T13:24:33.7352477Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7352479Z 2025-12-04T13:24:33.7352538Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7352581Z Traceback (most recent call last): 2025-12-04T13:24:33.7352759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7352828Z getattr(self, test_name)() 2025-12-04T13:24:33.7352989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7353023Z fn() 2025-12-04T13:24:33.7353176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7353216Z method(*args, **kwargs) 2025-12-04T13:24:33.7353369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7353409Z method(*args, **kwargs) 2025-12-04T13:24:33.7353559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7353595Z with policy(): 2025-12-04T13:24:33.7353748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7353790Z raise RuntimeError(msg) 2025-12-04T13:24:33.7354169Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7354171Z 2025-12-04T13:24:33.7354247Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7354502Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7354505Z 2025-12-04T13:24:33.7354593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7354595Z 2025-12-04T13:24:33.7354598Z 2025-12-04T13:24:33.7354674Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7354763Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7354997Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b11b91cb26e9a4c3.xml - 2025-12-04T13:24:33.7355058Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7355331Z FAILED [46.6473s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7355377Z Traceback (most recent call last): 2025-12-04T13:24:33.7355542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7355584Z getattr(self, test_name)() 2025-12-04T13:24:33.7355757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7355794Z fn() 2025-12-04T13:24:33.7355946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7355985Z method(*args, **kwargs) 2025-12-04T13:24:33.7356136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7356175Z method(*args, **kwargs) 2025-12-04T13:24:33.7356326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7356362Z with policy(): 2025-12-04T13:24:33.7356526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7356579Z raise RuntimeError(msg) 2025-12-04T13:24:33.7356960Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7356973Z 2025-12-04T13:24:33.7357047Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7357303Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7357305Z 2025-12-04T13:24:33.7357392Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7357394Z 2025-12-04T13:24:33.7357453Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7357499Z Traceback (most recent call last): 2025-12-04T13:24:33.7357663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7357706Z getattr(self, test_name)() 2025-12-04T13:24:33.7357865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7357900Z fn() 2025-12-04T13:24:33.7358051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7358092Z method(*args, **kwargs) 2025-12-04T13:24:33.7358241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7358282Z method(*args, **kwargs) 2025-12-04T13:24:33.7358433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7358472Z with policy(): 2025-12-04T13:24:33.7358623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7358665Z raise RuntimeError(msg) 2025-12-04T13:24:33.7359044Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7359048Z 2025-12-04T13:24:33.7359119Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7359375Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7359379Z 2025-12-04T13:24:33.7359476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7359542Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7359607Z ====================== 1 failed, 20 deselected in 46.81s ======================= 2025-12-04T13:24:33.7359645Z Got exit code 1 2025-12-04T13:24:33.7359897Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda 2025-12-04T13:24:33.7360027Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7360216Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56298d41b6be4b3c.xml 2025-12-04T13:24:33.7360275Z ============================= test session starts ============================== 2025-12-04T13:24:33.7360406Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7360465Z cachedir: .pytest_cache 2025-12-04T13:24:33.7360642Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7360690Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7360731Z configfile: pytest.ini 2025-12-04T13:24:33.7360894Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7360968Z collecting ... collected 60 items / 7 deselected / 53 selected 2025-12-04T13:24:33.7361022Z stepcurrent: skipping 7 already run items. 2025-12-04T13:24:33.7361066Z Running 14 items in this shard 2025-12-04T13:24:33.7361068Z 2025-12-04T13:24:33.7361413Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 13:12:20.785000 451692 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 451761 2025-12-04T13:24:33.7361568Z I1204 13:12:20.786000 451692 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 451762 2025-12-04T13:24:33.7361719Z I1204 13:12:20.786000 451692 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 451763 2025-12-04T13:24:33.7361869Z I1204 13:12:20.787000 451692 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 451764 2025-12-04T13:24:33.7362450Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7362489Z _warn_cpu_init() 2025-12-04T13:24:33.7362798Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7362836Z _init_core_state( 2025-12-04T13:24:33.7363329Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7363390Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7363975Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7364016Z _warn_cpu_init() 2025-12-04T13:24:33.7364316Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7364355Z _init_core_state( 2025-12-04T13:24:33.7364853Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7364942Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7365512Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7365550Z _warn_cpu_init() 2025-12-04T13:24:33.7366115Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7366153Z _warn_cpu_init() 2025-12-04T13:24:33.7366451Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7366488Z _init_core_state( 2025-12-04T13:24:33.7366978Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7367037Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7367526Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7367586Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7368071Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7368142Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7368440Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7368479Z _init_core_state( 2025-12-04T13:24:33.7368965Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7369022Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7369518Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7369595Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7370919Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7371047Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7372317Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7372441Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7373714Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7373838Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7375111Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7375256Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7375487Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7375532Z return func(*args, **kwargs) 2025-12-04T13:24:33.7375757Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7375799Z return func(*args, **kwargs) 2025-12-04T13:24:33.7376024Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7376066Z return func(*args, **kwargs) 2025-12-04T13:24:33.7376286Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7376329Z return func(*args, **kwargs) 2025-12-04T13:24:33.7376549Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7376592Z return func(*args, **kwargs) 2025-12-04T13:24:33.7376811Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7376852Z return func(*args, **kwargs) 2025-12-04T13:24:33.7377070Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7377111Z return func(*args, **kwargs) 2025-12-04T13:24:33.7377332Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7377374Z return func(*args, **kwargs) 2025-12-04T13:24:33.7377677Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7377720Z return func(*args, **kwargs) 2025-12-04T13:24:33.7377863Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7378027Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7378318Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7378484Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7378783Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7378920Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7379199Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7379347Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7379629Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7379816Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7380092Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7380229Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7380506Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7380657Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7381177Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7381294Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7381490Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7381905Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7382025Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7382239Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7382403Z [rank2]:E1204 13:12:52.876000 451763 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7382442Z dist init r=2, world=4 2025-12-04T13:24:33.7382581Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7382752Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7383059Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7383233Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7383516Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7383642Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7383918Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7384068Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7384345Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7384493Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7384768Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7384905Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7385186Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7385335Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7385853Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7385969Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7386176Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7386577Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7386690Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7386905Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7387078Z [rank0]:E1204 13:12:52.877000 451761 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7387128Z dist init r=0, world=4 2025-12-04T13:24:33.7387266Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7387438Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7387725Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7387878Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7388164Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7388289Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7388566Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7388712Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7388995Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7389145Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7389422Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7389560Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7389879Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7390029Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7390556Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7390673Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7390870Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7391266Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7391395Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7391619Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7391798Z [rank1]:E1204 13:12:52.921000 451762 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7391836Z dist init r=1, world=4 2025-12-04T13:24:33.7391974Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7392134Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7392422Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7392577Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7392861Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7392985Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7393260Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7393408Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7393687Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7393836Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7394113Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7394248Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7394527Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7394685Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7395201Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7395316Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7395509Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7395921Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7396060Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7396275Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7396438Z [rank3]:E1204 13:12:52.939000 451764 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7396478Z dist init r=3, world=4 2025-12-04T13:24:33.7396815Z [rank2]:[W1204 13:12:53.552894976 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7397147Z [rank0]:[W1204 13:12:53.566070501 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7397474Z [rank1]:[W1204 13:12:53.718495760 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7397799Z [rank3]:[W1204 13:12:53.727332163 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7397841Z FAILED [46.4488s] [ 7%] 2025-12-04T13:24:33.7397844Z 2025-12-04T13:24:33.7397903Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7398042Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.7398089Z Traceback (most recent call last): 2025-12-04T13:24:33.7398254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7398298Z self._join_processes(fn) 2025-12-04T13:24:33.7398476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7398530Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7398710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7398754Z raise RuntimeError(error) 2025-12-04T13:24:33.7398848Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7398896Z Traceback (most recent call last): 2025-12-04T13:24:33.7399055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7399098Z getattr(self, test_name)() 2025-12-04T13:24:33.7399254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7399290Z fn() 2025-12-04T13:24:33.7399441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7399484Z method(*args, **kwargs) 2025-12-04T13:24:33.7399634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7399731Z method(*args, **kwargs) 2025-12-04T13:24:33.7399900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7399956Z with policy(): 2025-12-04T13:24:33.7400106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7400148Z raise RuntimeError(msg) 2025-12-04T13:24:33.7400538Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7400541Z 2025-12-04T13:24:33.7400617Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7400892Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7400896Z 2025-12-04T13:24:33.7400983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7400985Z 2025-12-04T13:24:33.7400986Z 2025-12-04T13:24:33.7401061Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7401149Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7401383Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56298d41b6be4b3c.xml - 2025-12-04T13:24:33.7401444Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7401729Z FAILED [46.4488s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7401777Z Traceback (most recent call last): 2025-12-04T13:24:33.7401943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7401985Z getattr(self, test_name)() 2025-12-04T13:24:33.7402145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7402180Z fn() 2025-12-04T13:24:33.7402330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7402372Z method(*args, **kwargs) 2025-12-04T13:24:33.7402520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7402562Z method(*args, **kwargs) 2025-12-04T13:24:33.7402726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7402765Z with policy(): 2025-12-04T13:24:33.7402917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7402959Z raise RuntimeError(msg) 2025-12-04T13:24:33.7403349Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7403351Z 2025-12-04T13:24:33.7403425Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7407646Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7407660Z 2025-12-04T13:24:33.7407751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7407833Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7407894Z ======================= 1 failed, 7 deselected in 46.61s ======================= 2025-12-04T13:24:33.7407933Z Got exit code 1 2025-12-04T13:24:33.7407973Z Retrying single test... 2025-12-04T13:24:33.7408163Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-db02c5151641a7ea.xml 2025-12-04T13:24:33.7408221Z ============================= test session starts ============================== 2025-12-04T13:24:33.7408334Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7408375Z cachedir: .pytest_cache 2025-12-04T13:24:33.7408535Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7408583Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7408625Z configfile: pytest.ini 2025-12-04T13:24:33.7408786Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7408861Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7409125Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7409171Z Running 1 items in this shard 2025-12-04T13:24:33.7409173Z 2025-12-04T13:24:33.7409519Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 13:13:09.777000 452958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 453027 2025-12-04T13:24:33.7409676Z I1204 13:13:09.778000 452958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 453028 2025-12-04T13:24:33.7409882Z I1204 13:13:09.779000 452958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 453029 2025-12-04T13:24:33.7410031Z I1204 13:13:09.780000 452958 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 453030 2025-12-04T13:24:33.7410611Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7410649Z _warn_cpu_init() 2025-12-04T13:24:33.7410974Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7411012Z _init_core_state( 2025-12-04T13:24:33.7411506Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7411568Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7412152Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7412217Z _warn_cpu_init() 2025-12-04T13:24:33.7412515Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7412553Z _init_core_state( 2025-12-04T13:24:33.7413042Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7413105Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7413672Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7413709Z _warn_cpu_init() 2025-12-04T13:24:33.7414008Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7414044Z _init_core_state( 2025-12-04T13:24:33.7414537Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7414599Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7415164Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7415202Z _warn_cpu_init() 2025-12-04T13:24:33.7415701Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7415761Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7416256Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7416313Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7416811Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7416881Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7417179Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7417216Z _init_core_state( 2025-12-04T13:24:33.7417701Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7417761Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7419033Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7419160Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7420484Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7420608Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7421883Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7422029Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7423292Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7423414Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7423642Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7423688Z return func(*args, **kwargs) 2025-12-04T13:24:33.7423911Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7423954Z return func(*args, **kwargs) 2025-12-04T13:24:33.7424176Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7424217Z return func(*args, **kwargs) 2025-12-04T13:24:33.7424436Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7424478Z return func(*args, **kwargs) 2025-12-04T13:24:33.7424707Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7424750Z return func(*args, **kwargs) 2025-12-04T13:24:33.7424967Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7425008Z return func(*args, **kwargs) 2025-12-04T13:24:33.7425228Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7425268Z return func(*args, **kwargs) 2025-12-04T13:24:33.7425486Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7425537Z return func(*args, **kwargs) 2025-12-04T13:24:33.7425839Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7425890Z return func(*args, **kwargs) 2025-12-04T13:24:33.7426035Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7426198Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7426492Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7426649Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7426937Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7427063Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7427340Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7427490Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7427769Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7427920Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7428195Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7428333Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7428612Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7428763Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7429298Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7429415Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7429612Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7430067Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7430208Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7430421Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7430585Z [rank0]:E1204 13:13:41.950000 453027 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7430624Z dist init r=0, world=4 2025-12-04T13:24:33.7430762Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7430923Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7431214Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7431369Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7431652Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7431778Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7432056Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7432204Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7432483Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7432630Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7432908Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7433044Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7433334Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7433484Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7434002Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7434129Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7434335Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7434753Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7434867Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7435077Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7435242Z [rank2]:E1204 13:13:41.953000 453029 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7435281Z dist init r=2, world=4 2025-12-04T13:24:33.7435420Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7435580Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7435870Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7436023Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7436309Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7436435Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7436712Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7436860Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7437135Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7437283Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7437566Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7437706Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7437983Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7438133Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7438659Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7438795Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7438991Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7439385Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7439500Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7439756Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7439920Z [rank1]:E1204 13:13:41.954000 453028 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7439960Z dist init r=1, world=4 2025-12-04T13:24:33.7440097Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7440256Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7440546Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7440703Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7440988Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7441112Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7441390Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7441536Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7441827Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7441975Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7442251Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7442387Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7442677Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7442842Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7443375Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7443490Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7443684Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7444081Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7444195Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7444406Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7444569Z [rank3]:E1204 13:13:41.962000 453030 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7444607Z dist init r=3, world=4 2025-12-04T13:24:33.7444944Z [rank1]:[W1204 13:13:42.627350123 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7445276Z [rank0]:[W1204 13:13:42.632178335 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7445605Z [rank3]:[W1204 13:13:42.632735846 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7445932Z [rank2]:[W1204 13:13:42.649373519 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7445973Z FAILED [46.4475s] [100%] 2025-12-04T13:24:33.7445976Z 2025-12-04T13:24:33.7446043Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7446181Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.7446229Z Traceback (most recent call last): 2025-12-04T13:24:33.7446392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7446438Z self._join_processes(fn) 2025-12-04T13:24:33.7446613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7446667Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7446855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7446901Z raise RuntimeError(error) 2025-12-04T13:24:33.7446994Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7447052Z Traceback (most recent call last): 2025-12-04T13:24:33.7447212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7447256Z getattr(self, test_name)() 2025-12-04T13:24:33.7447414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7447450Z fn() 2025-12-04T13:24:33.7447602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7447645Z method(*args, **kwargs) 2025-12-04T13:24:33.7447796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7447838Z method(*args, **kwargs) 2025-12-04T13:24:33.7447991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7448029Z with policy(): 2025-12-04T13:24:33.7448182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7448223Z raise RuntimeError(msg) 2025-12-04T13:24:33.7448613Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7448616Z 2025-12-04T13:24:33.7448690Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7448962Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7448966Z 2025-12-04T13:24:33.7449053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7449055Z 2025-12-04T13:24:33.7449056Z 2025-12-04T13:24:33.7449133Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7449222Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7449454Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-db02c5151641a7ea.xml - 2025-12-04T13:24:33.7449516Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7449839Z FAILED [46.4475s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7449901Z Traceback (most recent call last): 2025-12-04T13:24:33.7450067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7450111Z getattr(self, test_name)() 2025-12-04T13:24:33.7450269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7450305Z fn() 2025-12-04T13:24:33.7450456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7450496Z method(*args, **kwargs) 2025-12-04T13:24:33.7450645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7450686Z method(*args, **kwargs) 2025-12-04T13:24:33.7450847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7450906Z with policy(): 2025-12-04T13:24:33.7451073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7451114Z raise RuntimeError(msg) 2025-12-04T13:24:33.7451504Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7451507Z 2025-12-04T13:24:33.7451580Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7451851Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7451854Z 2025-12-04T13:24:33.7451941Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7452004Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7452066Z ====================== 1 failed, 20 deselected in 46.61s ======================= 2025-12-04T13:24:33.7452105Z Got exit code 1 2025-12-04T13:24:33.7452145Z Retrying single test... 2025-12-04T13:24:33.7452338Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-17bc4be60a4b0776.xml 2025-12-04T13:24:33.7452395Z ============================= test session starts ============================== 2025-12-04T13:24:33.7452508Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7452549Z cachedir: .pytest_cache 2025-12-04T13:24:33.7452710Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7452757Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7452800Z configfile: pytest.ini 2025-12-04T13:24:33.7452961Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7453037Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7453300Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7453343Z Running 1 items in this shard 2025-12-04T13:24:33.7453345Z 2025-12-04T13:24:33.7453687Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 13:13:58.678000 454224 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 454293 2025-12-04T13:24:33.7453852Z I1204 13:13:58.678000 454224 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 454294 2025-12-04T13:24:33.7454006Z I1204 13:13:58.679000 454224 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 454295 2025-12-04T13:24:33.7454156Z I1204 13:13:58.680000 454224 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 454296 2025-12-04T13:24:33.7454749Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7454798Z _warn_cpu_init() 2025-12-04T13:24:33.7455101Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7455154Z _init_core_state( 2025-12-04T13:24:33.7455645Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7455707Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7456276Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7456316Z _warn_cpu_init() 2025-12-04T13:24:33.7456615Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7456652Z _init_core_state( 2025-12-04T13:24:33.7457146Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7457207Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7457772Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7457810Z _warn_cpu_init() 2025-12-04T13:24:33.7458108Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7458145Z _init_core_state( 2025-12-04T13:24:33.7458642Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7458703Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7459278Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7459326Z _warn_cpu_init() 2025-12-04T13:24:33.7459858Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7459929Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7460414Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7460471Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7460956Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7461014Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7461312Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7461349Z _init_core_state( 2025-12-04T13:24:33.7461839Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7461899Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7463181Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7463307Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7464581Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7464728Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7465988Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7466109Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7467368Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7467489Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7467729Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7467774Z return func(*args, **kwargs) 2025-12-04T13:24:33.7467998Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7468040Z return func(*args, **kwargs) 2025-12-04T13:24:33.7468261Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7468301Z return func(*args, **kwargs) 2025-12-04T13:24:33.7468523Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7468644Z return func(*args, **kwargs) 2025-12-04T13:24:33.7468884Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7468945Z return func(*args, **kwargs) 2025-12-04T13:24:33.7469165Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7469205Z return func(*args, **kwargs) 2025-12-04T13:24:33.7469426Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7469465Z return func(*args, **kwargs) 2025-12-04T13:24:33.7469683Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7469755Z return func(*args, **kwargs) 2025-12-04T13:24:33.7470048Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7470089Z return func(*args, **kwargs) 2025-12-04T13:24:33.7470232Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7470396Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7470686Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7470844Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7471131Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7471257Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7471535Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7471687Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7471982Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7472131Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7472409Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7472545Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7472825Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7472986Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7473519Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7473649Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7473844Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7474246Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7474361Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7474573Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7474737Z [rank0]:E1204 13:14:30.906000 454293 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7474776Z dist init r=0, world=4 2025-12-04T13:24:33.7474914Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7475075Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7475365Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7475519Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7475804Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7475928Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7476208Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7476370Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7476650Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7476798Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7477073Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7477220Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7477507Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7477667Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7478182Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7478299Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7478499Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7478898Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7479013Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7479223Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7479388Z [rank2]:E1204 13:14:30.920000 454295 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7479428Z dist init r=2, world=4 2025-12-04T13:24:33.7479567Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7479774Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7480061Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7480215Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7480500Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7480647Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7480927Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7481077Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7481354Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7481513Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7481791Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7481950Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7482228Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7482376Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7482894Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7483011Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7483207Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7483604Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7483718Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7483930Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7484094Z [rank1]:E1204 13:14:30.960000 454294 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7484133Z dist init r=1, world=4 2025-12-04T13:24:33.7484269Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7484429Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7484716Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7484885Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7485171Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7485294Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7485572Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7485720Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7486007Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7486177Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7486452Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7486588Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7486866Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7487016Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7487531Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728. 2025-12-04T13:24:33.7487646Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7487843Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7488241Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7488355Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7488564Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7488728Z [rank3]:E1204 13:14:30.979000 454296 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7488765Z dist init r=3, world=4 2025-12-04T13:24:33.7491037Z [rank0]:[W1204 13:14:31.584435334 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7491380Z [rank2]:[W1204 13:14:31.620910902 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7491709Z [rank1]:[W1204 13:14:31.719070222 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7492036Z [rank3]:[W1204 13:14:31.800557548 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7492095Z FAILED [46.6444s] [100%] 2025-12-04T13:24:33.7492112Z 2025-12-04T13:24:33.7492173Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7492324Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.7492372Z Traceback (most recent call last): 2025-12-04T13:24:33.7492539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7492584Z self._join_processes(fn) 2025-12-04T13:24:33.7492757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7492812Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7492992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7493037Z raise RuntimeError(error) 2025-12-04T13:24:33.7493121Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7493166Z Traceback (most recent call last): 2025-12-04T13:24:33.7493327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7493369Z getattr(self, test_name)() 2025-12-04T13:24:33.7493527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7493562Z fn() 2025-12-04T13:24:33.7493716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7493757Z method(*args, **kwargs) 2025-12-04T13:24:33.7493909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7493951Z method(*args, **kwargs) 2025-12-04T13:24:33.7494104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7494142Z with policy(): 2025-12-04T13:24:33.7494295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7494335Z raise RuntimeError(msg) 2025-12-04T13:24:33.7494729Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7494731Z 2025-12-04T13:24:33.7494806Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7495088Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7495092Z 2025-12-04T13:24:33.7495182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7495184Z 2025-12-04T13:24:33.7495244Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7495290Z Traceback (most recent call last): 2025-12-04T13:24:33.7495451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7495494Z getattr(self, test_name)() 2025-12-04T13:24:33.7495650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7495686Z fn() 2025-12-04T13:24:33.7495837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7495896Z method(*args, **kwargs) 2025-12-04T13:24:33.7496057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7496111Z method(*args, **kwargs) 2025-12-04T13:24:33.7496259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7496297Z with policy(): 2025-12-04T13:24:33.7496447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7496490Z raise RuntimeError(msg) 2025-12-04T13:24:33.7496878Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7496882Z 2025-12-04T13:24:33.7496957Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7497226Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7497229Z 2025-12-04T13:24:33.7497316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7497318Z 2025-12-04T13:24:33.7497377Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7497421Z Traceback (most recent call last): 2025-12-04T13:24:33.7497582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7497624Z getattr(self, test_name)() 2025-12-04T13:24:33.7497782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7497816Z fn() 2025-12-04T13:24:33.7497968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7498010Z method(*args, **kwargs) 2025-12-04T13:24:33.7498160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7498199Z method(*args, **kwargs) 2025-12-04T13:24:33.7498350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7498386Z with policy(): 2025-12-04T13:24:33.7498537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7498578Z raise RuntimeError(msg) 2025-12-04T13:24:33.7498976Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7498980Z 2025-12-04T13:24:33.7499054Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7499320Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7499323Z 2025-12-04T13:24:33.7499411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7499413Z 2025-12-04T13:24:33.7499414Z 2025-12-04T13:24:33.7499492Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7499583Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7499887Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-17bc4be60a4b0776.xml - 2025-12-04T13:24:33.7499986Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7500273Z FAILED [46.6444s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7500319Z Traceback (most recent call last): 2025-12-04T13:24:33.7500482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7500523Z getattr(self, test_name)() 2025-12-04T13:24:33.7500686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7500719Z fn() 2025-12-04T13:24:33.7500873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7500914Z method(*args, **kwargs) 2025-12-04T13:24:33.7501064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7501103Z method(*args, **kwargs) 2025-12-04T13:24:33.7501254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7501290Z with policy(): 2025-12-04T13:24:33.7501442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7501482Z raise RuntimeError(msg) 2025-12-04T13:24:33.7501875Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472. 2025-12-04T13:24:33.7501879Z 2025-12-04T13:24:33.7501953Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7502221Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7502223Z 2025-12-04T13:24:33.7502309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7502312Z 2025-12-04T13:24:33.7502368Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7502414Z Traceback (most recent call last): 2025-12-04T13:24:33.7502574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7502617Z getattr(self, test_name)() 2025-12-04T13:24:33.7502791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7502828Z fn() 2025-12-04T13:24:33.7502978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7503018Z method(*args, **kwargs) 2025-12-04T13:24:33.7503167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7503207Z method(*args, **kwargs) 2025-12-04T13:24:33.7503356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7503393Z with policy(): 2025-12-04T13:24:33.7503543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7503595Z raise RuntimeError(msg) 2025-12-04T13:24:33.7503983Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592. 2025-12-04T13:24:33.7504011Z 2025-12-04T13:24:33.7504083Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7504351Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7504353Z 2025-12-04T13:24:33.7504437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7504440Z 2025-12-04T13:24:33.7504497Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7504541Z Traceback (most recent call last): 2025-12-04T13:24:33.7504705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7504747Z getattr(self, test_name)() 2025-12-04T13:24:33.7504906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7504939Z fn() 2025-12-04T13:24:33.7505090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7505130Z method(*args, **kwargs) 2025-12-04T13:24:33.7505281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7505320Z method(*args, **kwargs) 2025-12-04T13:24:33.7505471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7505508Z with policy(): 2025-12-04T13:24:33.7505662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7505704Z raise RuntimeError(msg) 2025-12-04T13:24:33.7506091Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376. 2025-12-04T13:24:33.7506094Z 2025-12-04T13:24:33.7506166Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7506431Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7506434Z 2025-12-04T13:24:33.7506519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7506594Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7506660Z ====================== 1 failed, 20 deselected in 46.80s ======================= 2025-12-04T13:24:33.7506696Z Got exit code 1 2025-12-04T13:24:33.7506915Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.7507045Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7507233Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77e2cce3f6d16ae3.xml 2025-12-04T13:24:33.7507293Z ============================= test session starts ============================== 2025-12-04T13:24:33.7507416Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7507470Z cachedir: .pytest_cache 2025-12-04T13:24:33.7507630Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7507690Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7507730Z configfile: pytest.ini 2025-12-04T13:24:33.7507894Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7507966Z collecting ... collected 60 items / 8 deselected / 52 selected 2025-12-04T13:24:33.7508019Z stepcurrent: skipping 8 already run items. 2025-12-04T13:24:33.7508062Z Running 13 items in this shard 2025-12-04T13:24:33.7508064Z 2025-12-04T13:24:33.7508410Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 13:14:47.630000 455490 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 455559 2025-12-04T13:24:33.7508565Z I1204 13:14:47.630000 455490 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 455560 2025-12-04T13:24:33.7508718Z I1204 13:14:47.631000 455490 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 455561 2025-12-04T13:24:33.7508867Z I1204 13:14:47.632000 455490 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 455562 2025-12-04T13:24:33.7509446Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7509484Z _warn_cpu_init() 2025-12-04T13:24:33.7509827Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7509866Z _init_core_state( 2025-12-04T13:24:33.7510361Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7510425Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7511010Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7511048Z _warn_cpu_init() 2025-12-04T13:24:33.7511348Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7511384Z _init_core_state( 2025-12-04T13:24:33.7511887Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7511961Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7512545Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7512582Z _warn_cpu_init() 2025-12-04T13:24:33.7512878Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7512916Z _init_core_state( 2025-12-04T13:24:33.7513403Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7513463Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7513947Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7514005Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7514574Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7514611Z _warn_cpu_init() 2025-12-04T13:24:33.7515103Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7515162Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7515462Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7515506Z return func(*args, **kwargs) 2025-12-04T13:24:33.7515802Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7515838Z _init_core_state( 2025-12-04T13:24:33.7516332Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7516406Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7516900Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7516956Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7517187Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7517228Z return func(*args, **kwargs) 2025-12-04T13:24:33.7517456Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7517499Z return func(*args, **kwargs) 2025-12-04T13:24:33.7517721Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7517761Z return func(*args, **kwargs) 2025-12-04T13:24:33.7517981Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7518020Z return func(*args, **kwargs) 2025-12-04T13:24:33.7518239Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7518280Z return func(*args, **kwargs) 2025-12-04T13:24:33.7518501Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7518543Z return func(*args, **kwargs) 2025-12-04T13:24:33.7518761Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7518801Z return func(*args, **kwargs) 2025-12-04T13:24:33.7519019Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7519059Z return func(*args, **kwargs) 2025-12-04T13:24:33.7519205Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7519371Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7519679Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7519870Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7520156Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7520282Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7520575Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7520748Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7521025Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7521170Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7521447Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7521585Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7521865Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7522016Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7522540Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17469276160. 2025-12-04T13:24:33.7522657Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7522855Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7523252Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7523366Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7523577Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7523755Z [rank2]:E1204 13:15:20.420000 455561 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7523795Z dist init r=2, world=4 2025-12-04T13:24:33.7523935Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7524093Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7524385Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7524540Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7524835Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7524979Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7525255Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7525402Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7525677Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7525824Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7526101Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7526237Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7526516Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7526666Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7527183Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7527299Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7527494Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7527887Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7528010Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7528222Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7528386Z [rank3]:E1204 13:15:20.429000 455562 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7528425Z dist init r=3, world=4 2025-12-04T13:24:33.7528562Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7528721Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7529019Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7529185Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7529480Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7529604Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7529920Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7530068Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7530347Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7530493Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7530769Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7530903Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7531186Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7531337Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7531850Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17622368256. 2025-12-04T13:24:33.7531965Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7532160Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7532569Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7532684Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7532896Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7533060Z [rank0]:E1204 13:15:20.444000 455559 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7533098Z dist init r=0, world=4 2025-12-04T13:24:33.7533261Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7533433Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7533736Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7533889Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7534174Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7534298Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7534577Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7534726Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7535001Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7535149Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7535424Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7535563Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7535845Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7535996Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7536514Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17486053376. 2025-12-04T13:24:33.7536645Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7536844Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7537236Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7537350Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7537572Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7537746Z [rank1]:E1204 13:15:20.478000 455560 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7537797Z dist init r=1, world=4 2025-12-04T13:24:33.7538134Z [rank3]:[W1204 13:15:20.104786221 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7538472Z [rank2]:[W1204 13:15:20.123913977 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7538802Z [rank0]:[W1204 13:15:20.162879709 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7539129Z [rank1]:[W1204 13:15:20.261143037 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7539170Z FAILED [47.0455s] [ 7%] 2025-12-04T13:24:33.7539172Z 2025-12-04T13:24:33.7539228Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7539363Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7539409Z Traceback (most recent call last): 2025-12-04T13:24:33.7539573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7539618Z self._join_processes(fn) 2025-12-04T13:24:33.7539831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7539886Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7540064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7540108Z raise RuntimeError(error) 2025-12-04T13:24:33.7540190Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7540235Z Traceback (most recent call last): 2025-12-04T13:24:33.7540396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7540437Z getattr(self, test_name)() 2025-12-04T13:24:33.7540597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7540632Z fn() 2025-12-04T13:24:33.7540799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7540842Z method(*args, **kwargs) 2025-12-04T13:24:33.7540994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7541034Z method(*args, **kwargs) 2025-12-04T13:24:33.7541185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7541222Z with policy(): 2025-12-04T13:24:33.7541373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7541416Z raise RuntimeError(msg) 2025-12-04T13:24:33.7541818Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7541846Z 2025-12-04T13:24:33.7541923Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7542192Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7542194Z 2025-12-04T13:24:33.7542283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7542285Z 2025-12-04T13:24:33.7542287Z 2025-12-04T13:24:33.7542361Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7542451Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7542688Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-77e2cce3f6d16ae3.xml - 2025-12-04T13:24:33.7542749Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7543034Z FAILED [47.0455s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7543080Z Traceback (most recent call last): 2025-12-04T13:24:33.7543245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7543286Z getattr(self, test_name)() 2025-12-04T13:24:33.7543446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7543480Z fn() 2025-12-04T13:24:33.7543633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7543674Z method(*args, **kwargs) 2025-12-04T13:24:33.7543825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7543865Z method(*args, **kwargs) 2025-12-04T13:24:33.7544014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7544050Z with policy(): 2025-12-04T13:24:33.7544202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7544242Z raise RuntimeError(msg) 2025-12-04T13:24:33.7544640Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7544644Z 2025-12-04T13:24:33.7544720Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7544984Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7544987Z 2025-12-04T13:24:33.7545073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7545135Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7545197Z ======================= 1 failed, 8 deselected in 47.21s ======================= 2025-12-04T13:24:33.7545234Z Got exit code 1 2025-12-04T13:24:33.7545274Z Retrying single test... 2025-12-04T13:24:33.7545470Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-512873455287838d.xml 2025-12-04T13:24:33.7545539Z ============================= test session starts ============================== 2025-12-04T13:24:33.7545663Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7545704Z cachedir: .pytest_cache 2025-12-04T13:24:33.7545862Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7545908Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7545948Z configfile: pytest.ini 2025-12-04T13:24:33.7546111Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7546185Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7546448Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7546494Z Running 1 items in this shard 2025-12-04T13:24:33.7546496Z 2025-12-04T13:24:33.7546835Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 13:15:37.122000 456900 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 456969 2025-12-04T13:24:33.7546990Z I1204 13:15:37.122000 456900 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 456970 2025-12-04T13:24:33.7547142Z I1204 13:15:37.123000 456900 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 456971 2025-12-04T13:24:33.7547292Z I1204 13:15:37.124000 456900 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 456972 2025-12-04T13:24:33.7547870Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7547909Z _warn_cpu_init() 2025-12-04T13:24:33.7548212Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7548248Z _init_core_state( 2025-12-04T13:24:33.7548756Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7548819Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7549388Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7549425Z _warn_cpu_init() 2025-12-04T13:24:33.7549777Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7549828Z _init_core_state( 2025-12-04T13:24:33.7550333Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7550395Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7550964Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7551002Z _warn_cpu_init() 2025-12-04T13:24:33.7551302Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7551338Z _init_core_state( 2025-12-04T13:24:33.7551828Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7551886Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7552456Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7552494Z _warn_cpu_init() 2025-12-04T13:24:33.7552979Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7553039Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7553532Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7553592Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7553887Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7553925Z _init_core_state( 2025-12-04T13:24:33.7554424Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7554501Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7554790Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7554832Z return func(*args, **kwargs) 2025-12-04T13:24:33.7555317Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7555376Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7555604Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7555647Z return func(*args, **kwargs) 2025-12-04T13:24:33.7555869Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7555911Z return func(*args, **kwargs) 2025-12-04T13:24:33.7556130Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7556170Z return func(*args, **kwargs) 2025-12-04T13:24:33.7556390Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7556431Z return func(*args, **kwargs) 2025-12-04T13:24:33.7556653Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7556693Z return func(*args, **kwargs) 2025-12-04T13:24:33.7556911Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7556951Z return func(*args, **kwargs) 2025-12-04T13:24:33.7557168Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7557208Z return func(*args, **kwargs) 2025-12-04T13:24:33.7557437Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7557480Z return func(*args, **kwargs) 2025-12-04T13:24:33.7557625Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7557790Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7558079Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7558234Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7558530Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7558674Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7558953Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7559102Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7559379Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7559528Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7559836Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7559973Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7560250Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7560398Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7560915Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7561032Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7561230Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7561628Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7561756Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7561969Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7562133Z [rank3]:E1204 13:16:09.949000 456972 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7562172Z dist init r=3, world=4 2025-12-04T13:24:33.7562310Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7562470Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7562771Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7562958Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7563243Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7563367Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7563644Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7563794Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7564070Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7564218Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7564494Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7564629Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7564909Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7565061Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7565574Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17486053376. 2025-12-04T13:24:33.7565689Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7565887Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7566295Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7566409Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7566619Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7566782Z [rank1]:E1204 13:16:09.975000 456970 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7566820Z dist init r=1, world=4 2025-12-04T13:24:33.7566967Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7567138Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7567435Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7567587Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7567871Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7567994Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7568276Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7568425Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7568703Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7568849Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7569126Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7569264Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7569541Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7569742Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7570268Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17622368256. 2025-12-04T13:24:33.7570383Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7570580Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7570975Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7571088Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7571311Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7571488Z [rank0]:E1204 13:16:09.992000 456969 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7571538Z dist init r=0, world=4 2025-12-04T13:24:33.7571676Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7571834Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7572120Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7572275Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7572559Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7572684Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7572961Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7573110Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7573388Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7573537Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7573814Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7573948Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7574225Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7574373Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7574895Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17469276160. 2025-12-04T13:24:33.7575009Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7575205Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7575610Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7575743Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7575953Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7576115Z [rank2]:E1204 13:16:10.003000 456971 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7576153Z dist init r=2, world=4 2025-12-04T13:24:33.7576487Z [rank3]:[W1204 13:16:10.660702481 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7576819Z [rank1]:[W1204 13:16:10.720701734 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7577148Z [rank0]:[W1204 13:16:10.765214584 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7577474Z [rank2]:[W1204 13:16:10.791019068 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7577516Z FAILED [47.2464s] [100%] 2025-12-04T13:24:33.7577518Z 2025-12-04T13:24:33.7577575Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7577711Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7577757Z Traceback (most recent call last): 2025-12-04T13:24:33.7577921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7577964Z self._join_processes(fn) 2025-12-04T13:24:33.7578139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7578192Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7578371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7578415Z raise RuntimeError(error) 2025-12-04T13:24:33.7578495Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7578540Z Traceback (most recent call last): 2025-12-04T13:24:33.7578713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7578757Z getattr(self, test_name)() 2025-12-04T13:24:33.7578914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7578949Z fn() 2025-12-04T13:24:33.7579100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7579141Z method(*args, **kwargs) 2025-12-04T13:24:33.7579291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7579331Z method(*args, **kwargs) 2025-12-04T13:24:33.7579490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7579543Z with policy(): 2025-12-04T13:24:33.7579735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7579795Z raise RuntimeError(msg) 2025-12-04T13:24:33.7580185Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7580188Z 2025-12-04T13:24:33.7580263Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7580532Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7580535Z 2025-12-04T13:24:33.7580623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7580626Z 2025-12-04T13:24:33.7580629Z 2025-12-04T13:24:33.7580703Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7580791Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7581022Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-512873455287838d.xml - 2025-12-04T13:24:33.7581081Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7581362Z FAILED [47.2464s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7581407Z Traceback (most recent call last): 2025-12-04T13:24:33.7581574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7581619Z getattr(self, test_name)() 2025-12-04T13:24:33.7581778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7581813Z fn() 2025-12-04T13:24:33.7581964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7582004Z method(*args, **kwargs) 2025-12-04T13:24:33.7582153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7582192Z method(*args, **kwargs) 2025-12-04T13:24:33.7582342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7582380Z with policy(): 2025-12-04T13:24:33.7582545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7582588Z raise RuntimeError(msg) 2025-12-04T13:24:33.7582974Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7582976Z 2025-12-04T13:24:33.7583052Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7583317Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7583319Z 2025-12-04T13:24:33.7583420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7583496Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7583573Z ====================== 1 failed, 20 deselected in 47.41s ======================= 2025-12-04T13:24:33.7583611Z Got exit code 1 2025-12-04T13:24:33.7583651Z Retrying single test... 2025-12-04T13:24:33.7583841Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1bc8ad486f3988d2.xml 2025-12-04T13:24:33.7583897Z ============================= test session starts ============================== 2025-12-04T13:24:33.7584009Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7584050Z cachedir: .pytest_cache 2025-12-04T13:24:33.7584208Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7584254Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7584295Z configfile: pytest.ini 2025-12-04T13:24:33.7584457Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7584534Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7584794Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7584838Z Running 1 items in this shard 2025-12-04T13:24:33.7584840Z 2025-12-04T13:24:33.7585179Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 13:16:26.824000 458310 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 458379 2025-12-04T13:24:33.7585334Z I1204 13:16:26.824000 458310 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 458380 2025-12-04T13:24:33.7585487Z I1204 13:16:26.825000 458310 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 458381 2025-12-04T13:24:33.7585638Z I1204 13:16:26.825000 458310 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 458382 2025-12-04T13:24:33.7586213Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7586250Z _warn_cpu_init() 2025-12-04T13:24:33.7586563Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7586602Z _init_core_state( 2025-12-04T13:24:33.7587096Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7587158Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7587736Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7587793Z _warn_cpu_init() 2025-12-04T13:24:33.7588092Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7588129Z _init_core_state( 2025-12-04T13:24:33.7588619Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7588680Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7589249Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7589286Z _warn_cpu_init() 2025-12-04T13:24:33.7589589Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7589625Z _init_core_state( 2025-12-04T13:24:33.7590159Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7590220Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7590782Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7590819Z _warn_cpu_init() 2025-12-04T13:24:33.7591319Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7591378Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7591864Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7591921Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7592232Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T13:24:33.7592294Z _init_core_state( 2025-12-04T13:24:33.7592777Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7592834Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7593122Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7593165Z return func(*args, **kwargs) 2025-12-04T13:24:33.7593652Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.7593710Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.7593937Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7593980Z return func(*args, **kwargs) 2025-12-04T13:24:33.7594205Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7594246Z return func(*args, **kwargs) 2025-12-04T13:24:33.7594468Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7594510Z return func(*args, **kwargs) 2025-12-04T13:24:33.7594729Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7594770Z return func(*args, **kwargs) 2025-12-04T13:24:33.7594987Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7595028Z return func(*args, **kwargs) 2025-12-04T13:24:33.7595248Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7595298Z return func(*args, **kwargs) 2025-12-04T13:24:33.7595519Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7595559Z return func(*args, **kwargs) 2025-12-04T13:24:33.7595778Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7595817Z return func(*args, **kwargs) 2025-12-04T13:24:33.7595963Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7596125Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7596437Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7596615Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7596902Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7597028Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7597307Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7597456Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7597735Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7597882Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7598158Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7598294Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7598573Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7598721Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7599241Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17486053376. 2025-12-04T13:24:33.7599357Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7599564Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7600002Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7600116Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7600328Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7600492Z [rank1]:E1204 13:16:59.542000 458380 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7600544Z dist init r=1, world=4 2025-12-04T13:24:33.7600683Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7600869Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7601154Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7601309Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7601593Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7601719Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7601996Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7602143Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7602419Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7602564Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7602842Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7602978Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7603256Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7603404Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7603934Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17469276160. 2025-12-04T13:24:33.7604050Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7604245Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7604640Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7604753Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7604973Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7605156Z [rank2]:E1204 13:16:59.549000 458381 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7605293Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7605452Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7605738Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7605894Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7606181Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7606306Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7606583Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7606728Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7607007Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7607153Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7607430Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7607565Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7607842Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7607990Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7608515Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7608631Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7608825Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7609229Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7609360Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7609569Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7609780Z [rank3]:E1204 13:16:59.549000 458382 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7609819Z dist init r=2, world=4 2025-12-04T13:24:33.7609857Z dist init r=3, world=4 2025-12-04T13:24:33.7609994Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7610155Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7610442Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7610599Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7610883Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7611008Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7611287Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7611435Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7611714Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7611860Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7612136Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7612272Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7612570Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7612721Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7613235Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17622368256. 2025-12-04T13:24:33.7613361Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7613569Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7613978Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7614091Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7614301Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7614467Z [rank0]:E1204 13:16:59.639000 458379 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7614507Z dist init r=0, world=4 2025-12-04T13:24:33.7614841Z [rank1]:[W1204 13:16:59.215513473 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7615172Z [rank2]:[W1204 13:16:59.236858477 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7615499Z [rank3]:[W1204 13:16:59.237216391 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7615827Z [rank0]:[W1204 13:16:59.395636069 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7615869Z FAILED [47.1488s] [100%] 2025-12-04T13:24:33.7615871Z 2025-12-04T13:24:33.7615927Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7616059Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7616106Z Traceback (most recent call last): 2025-12-04T13:24:33.7616269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7616313Z self._join_processes(fn) 2025-12-04T13:24:33.7616488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7616553Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7616730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7616776Z raise RuntimeError(error) 2025-12-04T13:24:33.7616855Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7616901Z Traceback (most recent call last): 2025-12-04T13:24:33.7617060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7617104Z getattr(self, test_name)() 2025-12-04T13:24:33.7617262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7617298Z fn() 2025-12-04T13:24:33.7617461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7617513Z method(*args, **kwargs) 2025-12-04T13:24:33.7617666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7617717Z method(*args, **kwargs) 2025-12-04T13:24:33.7617866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7617903Z with policy(): 2025-12-04T13:24:33.7618055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7618096Z raise RuntimeError(msg) 2025-12-04T13:24:33.7618484Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17486053376. 2025-12-04T13:24:33.7618487Z 2025-12-04T13:24:33.7618563Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7618832Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7618835Z 2025-12-04T13:24:33.7618922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7618924Z 2025-12-04T13:24:33.7618983Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7619028Z Traceback (most recent call last): 2025-12-04T13:24:33.7619191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7619233Z getattr(self, test_name)() 2025-12-04T13:24:33.7619393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7619431Z fn() 2025-12-04T13:24:33.7619581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7619622Z method(*args, **kwargs) 2025-12-04T13:24:33.7619804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7619845Z method(*args, **kwargs) 2025-12-04T13:24:33.7619994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7620032Z with policy(): 2025-12-04T13:24:33.7620182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7620223Z raise RuntimeError(msg) 2025-12-04T13:24:33.7620621Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7620625Z 2025-12-04T13:24:33.7620700Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7620965Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7620967Z 2025-12-04T13:24:33.7621055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7621057Z 2025-12-04T13:24:33.7621059Z 2025-12-04T13:24:33.7621134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7621234Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7621486Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1bc8ad486f3988d2.xml - 2025-12-04T13:24:33.7621565Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7621850Z FAILED [47.1488s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7621896Z Traceback (most recent call last): 2025-12-04T13:24:33.7622059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7622101Z getattr(self, test_name)() 2025-12-04T13:24:33.7622261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7622296Z fn() 2025-12-04T13:24:33.7622448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7622488Z method(*args, **kwargs) 2025-12-04T13:24:33.7622638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7622676Z method(*args, **kwargs) 2025-12-04T13:24:33.7622826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7622861Z with policy(): 2025-12-04T13:24:33.7623013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7623055Z raise RuntimeError(msg) 2025-12-04T13:24:33.7623442Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17486053376. 2025-12-04T13:24:33.7623446Z 2025-12-04T13:24:33.7623519Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7623785Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7623787Z 2025-12-04T13:24:33.7623874Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7623876Z 2025-12-04T13:24:33.7623933Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7623978Z Traceback (most recent call last): 2025-12-04T13:24:33.7624140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7624183Z getattr(self, test_name)() 2025-12-04T13:24:33.7624351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7624387Z fn() 2025-12-04T13:24:33.7624538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7624578Z method(*args, **kwargs) 2025-12-04T13:24:33.7624728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7624766Z method(*args, **kwargs) 2025-12-04T13:24:33.7624915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7624951Z with policy(): 2025-12-04T13:24:33.7625111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7625152Z raise RuntimeError(msg) 2025-12-04T13:24:33.7625548Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17418944512. 2025-12-04T13:24:33.7625571Z 2025-12-04T13:24:33.7625643Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7625908Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7625910Z 2025-12-04T13:24:33.7625996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7626060Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7626124Z ====================== 1 failed, 20 deselected in 47.29s ======================= 2025-12-04T13:24:33.7626163Z Got exit code 1 2025-12-04T13:24:33.7626382Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7626509Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7626696Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-149966473123f376.xml 2025-12-04T13:24:33.7626753Z ============================= test session starts ============================== 2025-12-04T13:24:33.7626864Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7626905Z cachedir: .pytest_cache 2025-12-04T13:24:33.7627064Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7627111Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7627152Z configfile: pytest.ini 2025-12-04T13:24:33.7627314Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7627388Z collecting ... collected 60 items / 9 deselected / 51 selected 2025-12-04T13:24:33.7627440Z stepcurrent: skipping 9 already run items. 2025-12-04T13:24:33.7627483Z Running 12 items in this shard 2025-12-04T13:24:33.7627485Z 2025-12-04T13:24:33.7627798Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 13:17:16.397000 459720 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 459789 2025-12-04T13:24:33.7627952Z I1204 13:17:16.398000 459720 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 459790 2025-12-04T13:24:33.7628118Z I1204 13:17:16.398000 459720 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 459791 2025-12-04T13:24:33.7628269Z I1204 13:17:16.399000 459720 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 459792 2025-12-04T13:24:33.7628851Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7628888Z _warn_cpu_init() 2025-12-04T13:24:33.7629466Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7629525Z _warn_cpu_init() 2025-12-04T13:24:33.7630128Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7630165Z _warn_cpu_init() 2025-12-04T13:24:33.7630730Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7630768Z _warn_cpu_init() 2025-12-04T13:24:33.7631060Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7631103Z return func(*args, **kwargs) 2025-12-04T13:24:33.7631246Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7631409Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7631700Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7631856Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7632142Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7632266Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7632566Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7632714Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7632991Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7633139Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7633416Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7633568Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7633869Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7634017Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7634502Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7634619Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7634817Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7635180Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7635295Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7635507Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7635674Z [rank3]:E1204 13:17:24.170000 459792 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7635714Z dist init r=3, world=4 2025-12-04T13:24:33.7635853Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7636013Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7636300Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7636454Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7636739Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7636873Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7637150Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7637298Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7637573Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7637729Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7638018Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7638163Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7638441Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7638588Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7639072Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.7639189Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7639382Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7639790Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7639903Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7640117Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7640283Z [rank1]:E1204 13:17:24.180000 459790 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7640322Z dist init r=1, world=4 2025-12-04T13:24:33.7640460Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7640620Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7640906Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7641073Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7641359Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7641481Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7641758Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7641904Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7642194Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7642372Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7642648Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7642784Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7643062Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7643211Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7643692Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.7643807Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7644002Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7644364Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7644478Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7644689Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7644853Z [rank2]:E1204 13:17:24.191000 459791 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7644891Z dist init r=2, world=4 2025-12-04T13:24:33.7645029Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7645189Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7645486Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7645640Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7645923Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7646047Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7646332Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7646489Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7646773Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7646919Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7647195Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7647333Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7647612Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7647759Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7648243Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7648358Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7648554Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7648917Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7649030Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7649240Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7649405Z [rank0]:E1204 13:17:24.195000 459789 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7649444Z dist init r=0, world=4 2025-12-04T13:24:33.7649844Z [rank0]:[W1204 13:17:24.947992630 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7649886Z FAILED [9.7156s] [ 8%] 2025-12-04T13:24:33.7649888Z 2025-12-04T13:24:33.7649943Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7650046Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __ 2025-12-04T13:24:33.7650091Z Traceback (most recent call last): 2025-12-04T13:24:33.7650253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7650298Z self._join_processes(fn) 2025-12-04T13:24:33.7650482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7650550Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7650741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7650786Z raise RuntimeError(error) 2025-12-04T13:24:33.7650866Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7650911Z Traceback (most recent call last): 2025-12-04T13:24:33.7651071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7651114Z getattr(self, test_name)() 2025-12-04T13:24:33.7651271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7651306Z fn() 2025-12-04T13:24:33.7651457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7651501Z method(*args, **kwargs) 2025-12-04T13:24:33.7651651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7651691Z method(*args, **kwargs) 2025-12-04T13:24:33.7651843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7651880Z with policy(): 2025-12-04T13:24:33.7652031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7652072Z raise RuntimeError(msg) 2025-12-04T13:24:33.7652429Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7652434Z 2025-12-04T13:24:33.7652508Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7652746Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7652749Z 2025-12-04T13:24:33.7652834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7652837Z 2025-12-04T13:24:33.7652838Z 2025-12-04T13:24:33.7652912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7652998Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7653229Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-149966473123f376.xml - 2025-12-04T13:24:33.7653299Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7653553Z FAILED [9.7156s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7653600Z Traceback (most recent call last): 2025-12-04T13:24:33.7653763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7653808Z getattr(self, test_name)() 2025-12-04T13:24:33.7653966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7654001Z fn() 2025-12-04T13:24:33.7654152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7654204Z method(*args, **kwargs) 2025-12-04T13:24:33.7654355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7654421Z method(*args, **kwargs) 2025-12-04T13:24:33.7654569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7654605Z with policy(): 2025-12-04T13:24:33.7654755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7654796Z raise RuntimeError(msg) 2025-12-04T13:24:33.7655151Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7655153Z 2025-12-04T13:24:33.7655229Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7655466Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7655469Z 2025-12-04T13:24:33.7655556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7655618Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7655679Z ======================= 1 failed, 9 deselected in 9.88s ======================== 2025-12-04T13:24:33.7655716Z Got exit code 1 2025-12-04T13:24:33.7655756Z Retrying single test... 2025-12-04T13:24:33.7655945Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-764bcb219df428f9.xml 2025-12-04T13:24:33.7656002Z ============================= test session starts ============================== 2025-12-04T13:24:33.7656115Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7656157Z cachedir: .pytest_cache 2025-12-04T13:24:33.7656318Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7656363Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7656403Z configfile: pytest.ini 2025-12-04T13:24:33.7656565Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7656640Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7656868Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7656913Z Running 1 items in this shard 2025-12-04T13:24:33.7656916Z 2025-12-04T13:24:33.7657235Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 13:17:28.626000 460122 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 460191 2025-12-04T13:24:33.7657392Z I1204 13:17:28.627000 460122 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 460192 2025-12-04T13:24:33.7657542Z I1204 13:17:28.627000 460122 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 460193 2025-12-04T13:24:33.7657692Z I1204 13:17:28.627000 460122 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 460194 2025-12-04T13:24:33.7658286Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7658342Z _warn_cpu_init() 2025-12-04T13:24:33.7658909Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7658945Z _warn_cpu_init() 2025-12-04T13:24:33.7659508Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7659547Z _warn_cpu_init() 2025-12-04T13:24:33.7660151Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7660188Z _warn_cpu_init() 2025-12-04T13:24:33.7660479Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7660524Z return func(*args, **kwargs) 2025-12-04T13:24:33.7660669Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7660833Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7661125Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7661281Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7661586Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7661712Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7661990Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7662137Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7662414Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7662573Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7662863Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7663014Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7663291Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7663439Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7663928Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.7664046Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7664243Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7664606Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7664721Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7664933Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7665100Z [rank1]:E1204 13:17:36.379000 460192 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7665138Z dist init r=1, world=4 2025-12-04T13:24:33.7665277Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7665436Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7665724Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7665892Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7666177Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7666302Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7666578Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7666726Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7667013Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7667178Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7667453Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7667589Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7667868Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7668016Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7668502Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7668616Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7668811Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7669174Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7669288Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7669499Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7669661Z [rank0]:E1204 13:17:36.389000 460191 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7669740Z dist init r=0, world=4 2025-12-04T13:24:33.7669878Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7670039Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7670341Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7670496Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7670781Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7670903Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7671192Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7671362Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7671640Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7671786Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7672060Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7672197Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7672476Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7672624Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7673108Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.7673223Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7673420Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7673780Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7673892Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7674101Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7674265Z [rank2]:E1204 13:17:36.393000 460193 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7674304Z dist init r=2, world=4 2025-12-04T13:24:33.7674456Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7674616Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7674901Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7675055Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7675350Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7675484Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7675770Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7675916Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7676191Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7676339Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7676617Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7676752Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7677030Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7677177Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7677665Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7677779Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7677975Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7678336Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7678448Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7678670Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7678834Z [rank3]:E1204 13:17:36.462000 460194 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7678873Z dist init r=3, world=4 2025-12-04T13:24:33.7679208Z [rank0]:[W1204 13:17:36.084316727 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7679248Z FAILED [9.6177s] [100%] 2025-12-04T13:24:33.7679250Z 2025-12-04T13:24:33.7679305Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7679416Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __ 2025-12-04T13:24:33.7679462Z Traceback (most recent call last): 2025-12-04T13:24:33.7679636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7679724Z self._join_processes(fn) 2025-12-04T13:24:33.7679899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7679953Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7680131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7680175Z raise RuntimeError(error) 2025-12-04T13:24:33.7680254Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7680299Z Traceback (most recent call last): 2025-12-04T13:24:33.7680459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7680503Z getattr(self, test_name)() 2025-12-04T13:24:33.7680660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7680696Z fn() 2025-12-04T13:24:33.7680847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7680888Z method(*args, **kwargs) 2025-12-04T13:24:33.7681038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7681078Z method(*args, **kwargs) 2025-12-04T13:24:33.7681226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7681264Z with policy(): 2025-12-04T13:24:33.7681415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7681458Z raise RuntimeError(msg) 2025-12-04T13:24:33.7681814Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7681818Z 2025-12-04T13:24:33.7681892Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7682129Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7682131Z 2025-12-04T13:24:33.7682217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7682219Z 2025-12-04T13:24:33.7682221Z 2025-12-04T13:24:33.7682297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7682400Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7682635Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-764bcb219df428f9.xml - 2025-12-04T13:24:33.7682695Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7682948Z FAILED [9.6177s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7682994Z Traceback (most recent call last): 2025-12-04T13:24:33.7683157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7683199Z getattr(self, test_name)() 2025-12-04T13:24:33.7683371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7683420Z fn() 2025-12-04T13:24:33.7683586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7683626Z method(*args, **kwargs) 2025-12-04T13:24:33.7683776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7683815Z method(*args, **kwargs) 2025-12-04T13:24:33.7683963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7684000Z with policy(): 2025-12-04T13:24:33.7684150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7684190Z raise RuntimeError(msg) 2025-12-04T13:24:33.7684553Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7684557Z 2025-12-04T13:24:33.7684632Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7684867Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7684869Z 2025-12-04T13:24:33.7684956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7685018Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7685078Z ======================= 1 failed, 20 deselected in 9.78s ======================= 2025-12-04T13:24:33.7685115Z Got exit code 1 2025-12-04T13:24:33.7685155Z Retrying single test... 2025-12-04T13:24:33.7685347Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f0b0cc29b6fdaf42.xml 2025-12-04T13:24:33.7685404Z ============================= test session starts ============================== 2025-12-04T13:24:33.7685516Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7685557Z cachedir: .pytest_cache 2025-12-04T13:24:33.7685714Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7685759Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7685799Z configfile: pytest.ini 2025-12-04T13:24:33.7685958Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7686033Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7686271Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7686317Z Running 1 items in this shard 2025-12-04T13:24:33.7686319Z 2025-12-04T13:24:33.7686633Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 13:17:40.969000 460524 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 460593 2025-12-04T13:24:33.7686788Z I1204 13:17:40.970000 460524 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 460594 2025-12-04T13:24:33.7686940Z I1204 13:17:40.970000 460524 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 460595 2025-12-04T13:24:33.7687098Z I1204 13:17:40.971000 460524 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 460596 2025-12-04T13:24:33.7687684Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7687741Z _warn_cpu_init() 2025-12-04T13:24:33.7688305Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7688343Z _warn_cpu_init() 2025-12-04T13:24:33.7688908Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7688946Z _warn_cpu_init() 2025-12-04T13:24:33.7689511Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7689550Z _warn_cpu_init() 2025-12-04T13:24:33.7689882Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7689926Z return func(*args, **kwargs) 2025-12-04T13:24:33.7690068Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7690229Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7690517Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7690685Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7690970Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7691093Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7691373Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7691533Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7691823Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7691983Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7692259Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7692395Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7692673Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7692823Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7693307Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7693423Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7693620Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7693985Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7694100Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7694310Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7694474Z [rank3]:E1204 13:17:48.626000 460596 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7694512Z dist init r=3, world=4 2025-12-04T13:24:33.7694650Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7694820Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7695110Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7695264Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7695547Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7695671Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7695961Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7696130Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7696406Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7696552Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7696828Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7696965Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7697244Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7697390Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7697871Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.7697986Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7698182Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7698548Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7698661Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7698872Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7699036Z [rank1]:E1204 13:17:48.628000 460594 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7699087Z dist init r=1, world=4 2025-12-04T13:24:33.7699225Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7699383Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7699670Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7699858Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7700158Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7700293Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7700584Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7700731Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7701012Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7701160Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7701437Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7701573Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7701850Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7701999Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7702481Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7702596Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7702790Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7703156Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7703270Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7703495Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7703660Z [rank0]:E1204 13:17:48.635000 460593 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7703698Z dist init r=0, world=4 2025-12-04T13:24:33.7703835Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7703993Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7704288Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7704443Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7704753Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7704877Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7705152Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7705299Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7705578Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7705727Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7706003Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7706137Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7706416Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7706564Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7707046Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.7707159Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7707354Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7707729Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7707843Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7708053Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7708216Z [rank2]:E1204 13:17:48.730000 460595 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7708254Z dist init r=2, world=4 2025-12-04T13:24:33.7708599Z [rank0]:[W1204 13:17:48.338430993 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7708649Z FAILED [9.5195s] [100%] 2025-12-04T13:24:33.7708651Z 2025-12-04T13:24:33.7708716Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7708817Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __ 2025-12-04T13:24:33.7708862Z Traceback (most recent call last): 2025-12-04T13:24:33.7709025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7709068Z self._join_processes(fn) 2025-12-04T13:24:33.7709240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7709294Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7709471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7709517Z raise RuntimeError(error) 2025-12-04T13:24:33.7709596Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7709642Z Traceback (most recent call last): 2025-12-04T13:24:33.7709836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7709878Z getattr(self, test_name)() 2025-12-04T13:24:33.7710036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7710071Z fn() 2025-12-04T13:24:33.7710221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7710262Z method(*args, **kwargs) 2025-12-04T13:24:33.7710412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7710452Z method(*args, **kwargs) 2025-12-04T13:24:33.7710603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7710642Z with policy(): 2025-12-04T13:24:33.7710793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7710834Z raise RuntimeError(msg) 2025-12-04T13:24:33.7711191Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7711194Z 2025-12-04T13:24:33.7711268Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7711517Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7711521Z 2025-12-04T13:24:33.7711609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7711611Z 2025-12-04T13:24:33.7711671Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7711715Z Traceback (most recent call last): 2025-12-04T13:24:33.7711878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7711919Z getattr(self, test_name)() 2025-12-04T13:24:33.7712078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7712112Z fn() 2025-12-04T13:24:33.7712265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7712318Z method(*args, **kwargs) 2025-12-04T13:24:33.7712482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7712536Z method(*args, **kwargs) 2025-12-04T13:24:33.7712687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7712723Z with policy(): 2025-12-04T13:24:33.7712874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7712914Z raise RuntimeError(msg) 2025-12-04T13:24:33.7713270Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7713272Z 2025-12-04T13:24:33.7713346Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7713582Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7713585Z 2025-12-04T13:24:33.7713671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7713673Z 2025-12-04T13:24:33.7713674Z 2025-12-04T13:24:33.7713749Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7713837Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7714071Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f0b0cc29b6fdaf42.xml - 2025-12-04T13:24:33.7714133Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7714388Z FAILED [9.5195s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7714434Z Traceback (most recent call last): 2025-12-04T13:24:33.7714598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7714640Z getattr(self, test_name)() 2025-12-04T13:24:33.7714799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7714833Z fn() 2025-12-04T13:24:33.7714983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7715022Z method(*args, **kwargs) 2025-12-04T13:24:33.7715172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7715212Z method(*args, **kwargs) 2025-12-04T13:24:33.7715372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7715410Z with policy(): 2025-12-04T13:24:33.7715560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7715600Z raise RuntimeError(msg) 2025-12-04T13:24:33.7715955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7715957Z 2025-12-04T13:24:33.7716030Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7716276Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7716297Z 2025-12-04T13:24:33.7716384Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7716386Z 2025-12-04T13:24:33.7716444Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7718099Z Traceback (most recent call last): 2025-12-04T13:24:33.7718267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7718310Z getattr(self, test_name)() 2025-12-04T13:24:33.7718468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7718504Z fn() 2025-12-04T13:24:33.7718654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7718697Z method(*args, **kwargs) 2025-12-04T13:24:33.7718850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7718892Z method(*args, **kwargs) 2025-12-04T13:24:33.7719040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7719077Z with policy(): 2025-12-04T13:24:33.7719227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7719268Z raise RuntimeError(msg) 2025-12-04T13:24:33.7719623Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7719627Z 2025-12-04T13:24:33.7719736Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7719972Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7719975Z 2025-12-04T13:24:33.7720062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7720125Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7720186Z ======================= 1 failed, 20 deselected in 9.68s ======================= 2025-12-04T13:24:33.7720224Z Got exit code 1 2025-12-04T13:24:33.7720408Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T13:24:33.7720537Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7720750Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2b41b41082ae04da.xml 2025-12-04T13:24:33.7720810Z ============================= test session starts ============================== 2025-12-04T13:24:33.7720922Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7720964Z cachedir: .pytest_cache 2025-12-04T13:24:33.7721122Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7721169Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7721209Z configfile: pytest.ini 2025-12-04T13:24:33.7721372Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7721447Z collecting ... collected 60 items / 10 deselected / 50 selected 2025-12-04T13:24:33.7721521Z stepcurrent: skipping 10 already run items. 2025-12-04T13:24:33.7721577Z Running 11 items in this shard 2025-12-04T13:24:33.7721581Z 2025-12-04T13:24:33.7721905Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 13:17:53.171000 460926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 460995 2025-12-04T13:24:33.7722077Z I1204 13:17:53.172000 460926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 460996 2025-12-04T13:24:33.7722228Z I1204 13:17:53.173000 460926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 460997 2025-12-04T13:24:33.7722379Z I1204 13:17:53.173000 460926 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 460998 2025-12-04T13:24:33.7722967Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7723008Z _warn_cpu_init() 2025-12-04T13:24:33.7723300Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7723342Z return func(*args, **kwargs) 2025-12-04T13:24:33.7723914Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7723952Z _warn_cpu_init() 2025-12-04T13:24:33.7724516Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7724552Z _warn_cpu_init() 2025-12-04T13:24:33.7725130Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7725169Z _warn_cpu_init() 2025-12-04T13:24:33.7725314Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7725477Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7725766Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7725934Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7726230Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7726368Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7726647Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7726795Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7727073Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7727222Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7727501Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7727637Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7727915Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7728064Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7728561Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.7728678Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7728874Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7729251Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7729375Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7729590Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7729803Z [rank1]:E1204 13:18:01.010000 460996 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7729842Z dist init r=1, world=4 2025-12-04T13:24:33.7729981Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7730139Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7730441Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7730624Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7730908Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7731031Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7731310Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7731459Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7731735Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7731882Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7732159Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7732295Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7732573Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7732723Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7733214Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.7733328Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7733538Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7733912Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7734027Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7734238Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7734405Z [rank3]:E1204 13:18:01.018000 460998 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7734444Z dist init r=3, world=4 2025-12-04T13:24:33.7734591Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7734760Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7735060Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7735215Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7735498Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7735623Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7735900Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7736049Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7736325Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7736470Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7736750Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7736887Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7737165Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7737311Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7737822Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.7737939Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7738135Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7738507Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7738620Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7738843Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7739020Z [rank2]:E1204 13:18:01.087000 460997 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7739069Z dist init r=2, world=4 2025-12-04T13:24:33.7739206Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7739364Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7739650Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7739849Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7740135Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7740258Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7740535Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7740683Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7740960Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7741109Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7741388Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7741524Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7741800Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7741950Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7742453Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.7742567Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7742762Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7743147Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7743273Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7743501Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7743666Z [rank0]:E1204 13:18:01.095000 460995 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7743704Z dist init r=0, world=4 2025-12-04T13:24:33.7744041Z [rank0]:[W1204 13:18:01.023051034 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7744081Z FAILED [9.7167s] [ 9%] 2025-12-04T13:24:33.7744084Z 2025-12-04T13:24:33.7744140Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7744254Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7744301Z Traceback (most recent call last): 2025-12-04T13:24:33.7744464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7744506Z self._join_processes(fn) 2025-12-04T13:24:33.7744679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7744732Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7744909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7744952Z raise RuntimeError(error) 2025-12-04T13:24:33.7745033Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7745079Z Traceback (most recent call last): 2025-12-04T13:24:33.7745240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7745282Z getattr(self, test_name)() 2025-12-04T13:24:33.7745439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7745474Z fn() 2025-12-04T13:24:33.7745625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7745666Z method(*args, **kwargs) 2025-12-04T13:24:33.7745817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7745856Z method(*args, **kwargs) 2025-12-04T13:24:33.7746009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7746058Z with policy(): 2025-12-04T13:24:33.7746211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7746253Z raise RuntimeError(msg) 2025-12-04T13:24:33.7746617Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.7746620Z 2025-12-04T13:24:33.7746695Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7746952Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7746955Z 2025-12-04T13:24:33.7747053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7747066Z 2025-12-04T13:24:33.7747067Z 2025-12-04T13:24:33.7747142Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7747231Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7747464Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2b41b41082ae04da.xml - 2025-12-04T13:24:33.7747524Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7747787Z FAILED [9.7167s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7747832Z Traceback (most recent call last): 2025-12-04T13:24:33.7747999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7748042Z getattr(self, test_name)() 2025-12-04T13:24:33.7748202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7748236Z fn() 2025-12-04T13:24:33.7748388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7748427Z method(*args, **kwargs) 2025-12-04T13:24:33.7748577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7748616Z method(*args, **kwargs) 2025-12-04T13:24:33.7748766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7748803Z with policy(): 2025-12-04T13:24:33.7748955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7748997Z raise RuntimeError(msg) 2025-12-04T13:24:33.7749362Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.7749365Z 2025-12-04T13:24:33.7749438Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7749683Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7749685Z 2025-12-04T13:24:33.7749818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7749901Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7749964Z ======================= 1 failed, 10 deselected in 9.88s ======================= 2025-12-04T13:24:33.7750002Z Got exit code 1 2025-12-04T13:24:33.7750042Z Retrying single test... 2025-12-04T13:24:33.7750229Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6dc774203dff0d8d.xml 2025-12-04T13:24:33.7750287Z ============================= test session starts ============================== 2025-12-04T13:24:33.7750398Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7750440Z cachedir: .pytest_cache 2025-12-04T13:24:33.7750598Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7750645Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7750699Z configfile: pytest.ini 2025-12-04T13:24:33.7750878Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7750973Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7751214Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7751258Z Running 1 items in this shard 2025-12-04T13:24:33.7751260Z 2025-12-04T13:24:33.7751580Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 13:18:05.564000 461328 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 461397 2025-12-04T13:24:33.7751736Z I1204 13:18:05.565000 461328 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 461398 2025-12-04T13:24:33.7751887Z I1204 13:18:05.565000 461328 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 461399 2025-12-04T13:24:33.7752040Z I1204 13:18:05.566000 461328 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 461400 2025-12-04T13:24:33.7752617Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7752656Z _warn_cpu_init() 2025-12-04T13:24:33.7752953Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7752996Z return func(*args, **kwargs) 2025-12-04T13:24:33.7753566Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7753602Z _warn_cpu_init() 2025-12-04T13:24:33.7754184Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7754223Z _warn_cpu_init() 2025-12-04T13:24:33.7754786Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7754823Z _warn_cpu_init() 2025-12-04T13:24:33.7754967Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7755139Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7755441Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7755613Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7755898Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7756023Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7756302Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7756451Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7756729Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7756877Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7757153Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7757291Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7757574Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7757724Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7758218Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.7758335Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7758542Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7758918Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7759032Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7759243Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7759418Z [rank3]:E1204 13:18:13.492000 461400 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7759467Z dist init r=3, world=4 2025-12-04T13:24:33.7759605Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7759812Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7760104Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7760257Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7760542Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7760669Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7760946Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7761094Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7761369Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7761517Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7761794Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7761931Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7762209Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7762358Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7762865Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.7762980Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7763177Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7763550Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7763680Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7763905Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7764081Z [rank1]:E1204 13:18:13.494000 461398 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7764121Z dist init r=1, world=4 2025-12-04T13:24:33.7764258Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7764417Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7764708Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7764865Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7765149Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7765273Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7765550Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7765696Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7765974Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7766121Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7766397Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7766531Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7766812Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7766972Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7767464Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.7767579Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7767773Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7768157Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7768299Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7768510Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7768674Z [rank0]:E1204 13:18:13.504000 461397 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7768712Z dist init r=0, world=4 2025-12-04T13:24:33.7768850Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7769008Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7769301Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7769455Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7769785Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7769908Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7770188Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7770336Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7770611Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7770762Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7771037Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7771174Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7771467Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7771619Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7772112Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.7772238Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7772449Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7772834Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7772947Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7773157Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7773322Z [rank2]:E1204 13:18:13.572000 461399 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7773362Z dist init r=2, world=4 2025-12-04T13:24:33.7773698Z [rank0]:[W1204 13:18:13.210255714 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7773739Z FAILED [9.8157s] [100%] 2025-12-04T13:24:33.7773741Z 2025-12-04T13:24:33.7773796Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7773909Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7773954Z Traceback (most recent call last): 2025-12-04T13:24:33.7774118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7774162Z self._join_processes(fn) 2025-12-04T13:24:33.7774336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7774391Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7774569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7774612Z raise RuntimeError(error) 2025-12-04T13:24:33.7774693Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7774738Z Traceback (most recent call last): 2025-12-04T13:24:33.7774899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7774940Z getattr(self, test_name)() 2025-12-04T13:24:33.7775098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7775133Z fn() 2025-12-04T13:24:33.7775295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7775338Z method(*args, **kwargs) 2025-12-04T13:24:33.7775487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7775528Z method(*args, **kwargs) 2025-12-04T13:24:33.7775677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7775714Z with policy(): 2025-12-04T13:24:33.7775864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7775905Z raise RuntimeError(msg) 2025-12-04T13:24:33.7776282Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.7776307Z 2025-12-04T13:24:33.7776384Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7776630Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7776632Z 2025-12-04T13:24:33.7776719Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7776721Z 2025-12-04T13:24:33.7776723Z 2025-12-04T13:24:33.7776797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7776884Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7777119Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6dc774203dff0d8d.xml - 2025-12-04T13:24:33.7777180Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7777441Z FAILED [9.8157s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7777486Z Traceback (most recent call last): 2025-12-04T13:24:33.7777650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7777692Z getattr(self, test_name)() 2025-12-04T13:24:33.7777851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7777886Z fn() 2025-12-04T13:24:33.7778038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7778078Z method(*args, **kwargs) 2025-12-04T13:24:33.7778229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7778269Z method(*args, **kwargs) 2025-12-04T13:24:33.7778421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7778457Z with policy(): 2025-12-04T13:24:33.7778609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7778649Z raise RuntimeError(msg) 2025-12-04T13:24:33.7779017Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.7779019Z 2025-12-04T13:24:33.7779104Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7779350Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7779352Z 2025-12-04T13:24:33.7779438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7779499Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7779561Z ======================= 1 failed, 20 deselected in 9.97s ======================= 2025-12-04T13:24:33.7779597Z Got exit code 1 2025-12-04T13:24:33.7779637Z Retrying single test... 2025-12-04T13:24:33.7779868Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0abed9af6c5e52fe.xml 2025-12-04T13:24:33.7779939Z ============================= test session starts ============================== 2025-12-04T13:24:33.7780066Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7780122Z cachedir: .pytest_cache 2025-12-04T13:24:33.7780279Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7780325Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7780365Z configfile: pytest.ini 2025-12-04T13:24:33.7780527Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7780602Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7780844Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7780889Z Running 1 items in this shard 2025-12-04T13:24:33.7780892Z 2025-12-04T13:24:33.7781212Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 13:18:18.043000 461730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 461799 2025-12-04T13:24:33.7781368Z I1204 13:18:18.043000 461730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 461800 2025-12-04T13:24:33.7781520Z I1204 13:18:18.044000 461730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 461801 2025-12-04T13:24:33.7781670Z I1204 13:18:18.045000 461730 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 461802 2025-12-04T13:24:33.7782248Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7782287Z _warn_cpu_init() 2025-12-04T13:24:33.7782855Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7782891Z _warn_cpu_init() 2025-12-04T13:24:33.7783472Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7783510Z _warn_cpu_init() 2025-12-04T13:24:33.7783802Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7783845Z return func(*args, **kwargs) 2025-12-04T13:24:33.7784430Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7784486Z _warn_cpu_init() 2025-12-04T13:24:33.7784629Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7784791Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7785079Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7785234Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7785523Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7785648Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7785926Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7786073Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7786351Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7786499Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7786775Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7786913Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7787191Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7787340Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7787847Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.7787965Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7788160Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7788545Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7788670Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7788890Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7789054Z [rank3]:E1204 13:18:25.774000 461802 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7789093Z dist init r=3, world=4 2025-12-04T13:24:33.7789231Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7789389Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7789680Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7789882Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7790169Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7790293Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7790568Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7790717Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7790993Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7791141Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7791415Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7791552Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7791847Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7791996Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7792486Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.7792599Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7792809Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7793196Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7793324Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7793536Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7793699Z [rank2]:E1204 13:18:25.778000 461801 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7793738Z dist init r=2, world=4 2025-12-04T13:24:33.7793875Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7794036Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7794325Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7794479Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7794762Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7794886Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7795165Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7795312Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7795588Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7795733Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7796011Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7796157Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7796436Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7796585Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7797087Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.7797212Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7797418Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7797791Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7797905Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7798115Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7798281Z [rank0]:E1204 13:18:25.780000 461799 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7798320Z dist init r=0, world=4 2025-12-04T13:24:33.7798458Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7798615Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7798905Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7799060Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7799348Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7799473Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7799786Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7799933Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7800211Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7800376Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7800653Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7800788Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7801066Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7801216Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7801721Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.7801861Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7802057Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7802430Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7802545Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7802758Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7802921Z [rank1]:E1204 13:18:25.789000 461800 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7802960Z dist init r=1, world=4 2025-12-04T13:24:33.7803295Z [rank0]:[W1204 13:18:26.487815633 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7803334Z FAILED [9.6138s] [100%] 2025-12-04T13:24:33.7803336Z 2025-12-04T13:24:33.7803392Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7803506Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.7803552Z Traceback (most recent call last): 2025-12-04T13:24:33.7803716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7803758Z self._join_processes(fn) 2025-12-04T13:24:33.7803931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7803984Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7804161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7804204Z raise RuntimeError(error) 2025-12-04T13:24:33.7804284Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7804329Z Traceback (most recent call last): 2025-12-04T13:24:33.7804503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7804547Z getattr(self, test_name)() 2025-12-04T13:24:33.7804703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7804738Z fn() 2025-12-04T13:24:33.7804888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7804930Z method(*args, **kwargs) 2025-12-04T13:24:33.7805078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7805118Z method(*args, **kwargs) 2025-12-04T13:24:33.7805278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7805326Z with policy(): 2025-12-04T13:24:33.7805477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7805529Z raise RuntimeError(msg) 2025-12-04T13:24:33.7805893Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.7805896Z 2025-12-04T13:24:33.7805972Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7806217Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7806220Z 2025-12-04T13:24:33.7806308Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7806311Z 2025-12-04T13:24:33.7806313Z 2025-12-04T13:24:33.7806390Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7806477Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7806710Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0abed9af6c5e52fe.xml - 2025-12-04T13:24:33.7806770Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7807034Z FAILED [9.6138s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7807079Z Traceback (most recent call last): 2025-12-04T13:24:33.7807242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7807286Z getattr(self, test_name)() 2025-12-04T13:24:33.7807447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7807481Z fn() 2025-12-04T13:24:33.7807633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7807672Z method(*args, **kwargs) 2025-12-04T13:24:33.7807822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7807860Z method(*args, **kwargs) 2025-12-04T13:24:33.7808009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7808046Z with policy(): 2025-12-04T13:24:33.7808207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7808249Z raise RuntimeError(msg) 2025-12-04T13:24:33.7808614Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.7808616Z 2025-12-04T13:24:33.7808692Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7808935Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7808937Z 2025-12-04T13:24:33.7809023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7809094Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7809166Z ======================= 1 failed, 20 deselected in 9.78s ======================= 2025-12-04T13:24:33.7809213Z Got exit code 1 2025-12-04T13:24:33.7809408Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.7809535Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7809770Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e4d5cf790a3e2c65.xml 2025-12-04T13:24:33.7809827Z ============================= test session starts ============================== 2025-12-04T13:24:33.7809938Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7809979Z cachedir: .pytest_cache 2025-12-04T13:24:33.7810137Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7810184Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7810225Z configfile: pytest.ini 2025-12-04T13:24:33.7810388Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7810462Z collecting ... collected 60 items / 11 deselected / 49 selected 2025-12-04T13:24:33.7810515Z stepcurrent: skipping 11 already run items. 2025-12-04T13:24:33.7810558Z Running 10 items in this shard 2025-12-04T13:24:33.7810560Z 2025-12-04T13:24:33.7810873Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 13:18:30.076000 462132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 462201 2025-12-04T13:24:33.7811027Z I1204 13:18:30.077000 462132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 462202 2025-12-04T13:24:33.7811181Z I1204 13:18:30.078000 462132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 462203 2025-12-04T13:24:33.7811331Z I1204 13:18:30.078000 462132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 462204 2025-12-04T13:24:33.7811627Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7811678Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7812270Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7812310Z _warn_cpu_init() 2025-12-04T13:24:33.7812597Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7812677Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7812962Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7813012Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7813593Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7813662Z _warn_cpu_init() 2025-12-04T13:24:33.7813948Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7813996Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7814279Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7814327Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7814894Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7814932Z _warn_cpu_init() 2025-12-04T13:24:33.7815501Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7815540Z _warn_cpu_init() 2025-12-04T13:24:33.7815825Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7815902Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7816185Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7816260Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7816545Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7816628Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7817928Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7818075Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7818302Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7818346Z return func(*args, **kwargs) 2025-12-04T13:24:33.7819611Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7819771Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7819999Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7820040Z return func(*args, **kwargs) 2025-12-04T13:24:33.7821320Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7821445Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7821683Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7821726Z return func(*args, **kwargs) 2025-12-04T13:24:33.7823005Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7823153Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7823378Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7823419Z return func(*args, **kwargs) 2025-12-04T13:24:33.7823642Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7823682Z return func(*args, **kwargs) 2025-12-04T13:24:33.7823905Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7823947Z return func(*args, **kwargs) 2025-12-04T13:24:33.7824164Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7824205Z return func(*args, **kwargs) 2025-12-04T13:24:33.7824422Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7824463Z return func(*args, **kwargs) 2025-12-04T13:24:33.7824753Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7824795Z return func(*args, **kwargs) 2025-12-04T13:24:33.7824940Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7825103Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7825394Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7825550Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7825846Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7825972Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7826251Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7826399Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7826676Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7826833Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7827122Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7827270Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7827552Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7827701Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7828192Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896. 2025-12-04T13:24:33.7828310Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7828506Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7828872Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7828987Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7829200Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7829366Z [rank1]:E1204 13:18:37.571000 462202 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7829404Z dist init r=1, world=4 2025-12-04T13:24:33.7829543Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7829741Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7830034Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7830205Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7830492Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7830615Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7830891Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7831038Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7831327Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7831506Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7831781Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7831917Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7832199Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7832347Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7832834Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776. 2025-12-04T13:24:33.7832947Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7833142Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7833507Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7833622Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7833834Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7833998Z [rank0]:E1204 13:18:37.578000 462201 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7834037Z dist init r=0, world=4 2025-12-04T13:24:33.7834173Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7834333Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7834633Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7834789Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7835073Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7835197Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7835487Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7835654Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7835931Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7836077Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7836354Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7836491Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7836773Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7836923Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7837406Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 168448 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032. 2025-12-04T13:24:33.7837520Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7837716Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7838080Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7838192Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7838403Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7838568Z [rank3]:E1204 13:18:37.635000 462204 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7838607Z dist init r=3, world=4 2025-12-04T13:24:33.7838755Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7838915Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7839205Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7839358Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7839655Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7839825Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7840118Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7840265Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7840540Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7840688Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7840963Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7841101Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7841377Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7841526Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7842011Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 160256 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680. 2025-12-04T13:24:33.7842125Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7842320Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7842681Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7842794Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7843017Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7843183Z [rank2]:E1204 13:18:37.646000 462203 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7843222Z dist init r=2, world=4 2025-12-04T13:24:33.7843557Z [rank0]:[W1204 13:18:37.274631274 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7843597Z FAILED [9.5164s] [ 10%] 2025-12-04T13:24:33.7843599Z 2025-12-04T13:24:33.7843655Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7843756Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __ 2025-12-04T13:24:33.7843815Z Traceback (most recent call last): 2025-12-04T13:24:33.7843994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7844047Z self._join_processes(fn) 2025-12-04T13:24:33.7844220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7844273Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7844451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7844494Z raise RuntimeError(error) 2025-12-04T13:24:33.7844573Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7844618Z Traceback (most recent call last): 2025-12-04T13:24:33.7844780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7844822Z getattr(self, test_name)() 2025-12-04T13:24:33.7844980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7845016Z fn() 2025-12-04T13:24:33.7845168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7845210Z method(*args, **kwargs) 2025-12-04T13:24:33.7845360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7845401Z method(*args, **kwargs) 2025-12-04T13:24:33.7845550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7845587Z with policy(): 2025-12-04T13:24:33.7845739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7845781Z raise RuntimeError(msg) 2025-12-04T13:24:33.7846143Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776. 2025-12-04T13:24:33.7846146Z 2025-12-04T13:24:33.7846222Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7846458Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7846460Z 2025-12-04T13:24:33.7846547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7846549Z 2025-12-04T13:24:33.7846608Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7846655Z Traceback (most recent call last): 2025-12-04T13:24:33.7846829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7846871Z getattr(self, test_name)() 2025-12-04T13:24:33.7847030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7847064Z fn() 2025-12-04T13:24:33.7847214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7847253Z method(*args, **kwargs) 2025-12-04T13:24:33.7847403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7847441Z method(*args, **kwargs) 2025-12-04T13:24:33.7847607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7847643Z with policy(): 2025-12-04T13:24:33.7847807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7847859Z raise RuntimeError(msg) 2025-12-04T13:24:33.7848220Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896. 2025-12-04T13:24:33.7848222Z 2025-12-04T13:24:33.7848295Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7848530Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7848532Z 2025-12-04T13:24:33.7848619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7848622Z 2025-12-04T13:24:33.7848624Z 2025-12-04T13:24:33.7848700Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7848789Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7849021Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e4d5cf790a3e2c65.xml - 2025-12-04T13:24:33.7849082Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7849335Z FAILED [9.5164s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.7849381Z Traceback (most recent call last): 2025-12-04T13:24:33.7849544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7849588Z getattr(self, test_name)() 2025-12-04T13:24:33.7849791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7849827Z fn() 2025-12-04T13:24:33.7849978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7850018Z method(*args, **kwargs) 2025-12-04T13:24:33.7850167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7850207Z method(*args, **kwargs) 2025-12-04T13:24:33.7850359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7850396Z with policy(): 2025-12-04T13:24:33.7850548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7850604Z raise RuntimeError(msg) 2025-12-04T13:24:33.7850962Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776. 2025-12-04T13:24:33.7850965Z 2025-12-04T13:24:33.7851038Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7851274Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7851276Z 2025-12-04T13:24:33.7851361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7851364Z 2025-12-04T13:24:33.7851422Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.7851479Z Traceback (most recent call last): 2025-12-04T13:24:33.7851655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7851711Z getattr(self, test_name)() 2025-12-04T13:24:33.7851869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7851905Z fn() 2025-12-04T13:24:33.7852056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7852096Z method(*args, **kwargs) 2025-12-04T13:24:33.7852244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7852284Z method(*args, **kwargs) 2025-12-04T13:24:33.7852433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7852470Z with policy(): 2025-12-04T13:24:33.7852622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7852663Z raise RuntimeError(msg) 2025-12-04T13:24:33.7853016Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896. 2025-12-04T13:24:33.7853018Z 2025-12-04T13:24:33.7853092Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7853324Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7853326Z 2025-12-04T13:24:33.7853414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7853478Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7853542Z ======================= 1 failed, 11 deselected in 9.67s ======================= 2025-12-04T13:24:33.7853579Z Got exit code 1 2025-12-04T13:24:33.7853620Z Retrying single test... 2025-12-04T13:24:33.7853808Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a7d0e5b822d8632d.xml 2025-12-04T13:24:33.7853866Z ============================= test session starts ============================== 2025-12-04T13:24:33.7853978Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7854019Z cachedir: .pytest_cache 2025-12-04T13:24:33.7854178Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7854224Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7854265Z configfile: pytest.ini 2025-12-04T13:24:33.7854437Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7854514Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7854745Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7854789Z Running 1 items in this shard 2025-12-04T13:24:33.7854791Z 2025-12-04T13:24:33.7855103Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 13:18:41.999000 462534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 462603 2025-12-04T13:24:33.7855269Z I1204 13:18:42.000000 462534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 462604 2025-12-04T13:24:33.7855431Z I1204 13:18:42.001000 462534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 462605 2025-12-04T13:24:33.7855592Z I1204 13:18:42.001000 462534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 462606 2025-12-04T13:24:33.7855885Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7855936Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7856223Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7856272Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7856558Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7856607Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7856891Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7856938Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7857522Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7857561Z _warn_cpu_init() 2025-12-04T13:24:33.7858126Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7858163Z _warn_cpu_init() 2025-12-04T13:24:33.7858736Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7858775Z _warn_cpu_init() 2025-12-04T13:24:33.7859338Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7859374Z _warn_cpu_init() 2025-12-04T13:24:33.7859674Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7859798Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7860101Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7860177Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7860465Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7860538Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7860824Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7860899Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7862187Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7862314Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7862543Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7862586Z return func(*args, **kwargs) 2025-12-04T13:24:33.7863882Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7864007Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7864231Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7864273Z return func(*args, **kwargs) 2025-12-04T13:24:33.7865551Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7865698Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7866966Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7867094Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7867322Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7867427Z return func(*args, **kwargs) 2025-12-04T13:24:33.7867731Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7867804Z return func(*args, **kwargs) 2025-12-04T13:24:33.7868038Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7868080Z return func(*args, **kwargs) 2025-12-04T13:24:33.7868311Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7868353Z return func(*args, **kwargs) 2025-12-04T13:24:33.7868571Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7868611Z return func(*args, **kwargs) 2025-12-04T13:24:33.7868830Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7868869Z return func(*args, **kwargs) 2025-12-04T13:24:33.7869174Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7869260Z return func(*args, **kwargs) 2025-12-04T13:24:33.7869423Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7869628Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7869998Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7870178Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7870464Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7870590Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7870868Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7871016Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7871299Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7871447Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7871726Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7871906Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7872198Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7872408Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7872972Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680. 2025-12-04T13:24:33.7873090Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7873285Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7873649Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7873779Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7873999Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7874190Z [rank2]:E1204 13:18:49.406000 462605 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7874231Z dist init r=2, world=4 2025-12-04T13:24:33.7874370Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7874531Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7874819Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7874975Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7875259Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7875383Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7875659Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7875806Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7876083Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7876230Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7876505Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7876640Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7876920Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7877085Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7877571Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776. 2025-12-04T13:24:33.7877685Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7877880Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7878256Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7878393Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7878604Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7878768Z [rank0]:E1204 13:18:49.412000 462603 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7878806Z dist init r=0, world=4 2025-12-04T13:24:33.7878943Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7879102Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7879396Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7879551Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7879884Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7880007Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7880286Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7880434Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7880710Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7880856Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7881130Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7881266Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7881555Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7881705Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7882189Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032. 2025-12-04T13:24:33.7882314Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7882511Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7882898Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7883011Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7883224Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7883389Z [rank3]:E1204 13:18:49.461000 462606 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7883428Z dist init r=3, world=4 2025-12-04T13:24:33.7883566Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7883727Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7884012Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7884166Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7884449Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7884575Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7884850Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7884998Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7885273Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7885418Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7885703Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7885840Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7886117Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7886265Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7886760Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896. 2025-12-04T13:24:33.7886901Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7887095Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7887458Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7887569Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7887781Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7887945Z [rank1]:E1204 13:18:49.467000 462604 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7887984Z dist init r=1, world=4 2025-12-04T13:24:33.7888320Z [rank0]:[W1204 13:18:49.109439137 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7888359Z FAILED [9.3160s] [100%] 2025-12-04T13:24:33.7888361Z 2025-12-04T13:24:33.7888418Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7888519Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __ 2025-12-04T13:24:33.7888565Z Traceback (most recent call last): 2025-12-04T13:24:33.7888731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7888777Z self._join_processes(fn) 2025-12-04T13:24:33.7888950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7889004Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7889181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7889225Z raise RuntimeError(error) 2025-12-04T13:24:33.7889304Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7889350Z Traceback (most recent call last): 2025-12-04T13:24:33.7889511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7889554Z getattr(self, test_name)() 2025-12-04T13:24:33.7889780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7889817Z fn() 2025-12-04T13:24:33.7889967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7890008Z method(*args, **kwargs) 2025-12-04T13:24:33.7890158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7890198Z method(*args, **kwargs) 2025-12-04T13:24:33.7890348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7890384Z with policy(): 2025-12-04T13:24:33.7890550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7890602Z raise RuntimeError(msg) 2025-12-04T13:24:33.7890965Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680. 2025-12-04T13:24:33.7890982Z 2025-12-04T13:24:33.7891057Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7891293Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7891296Z 2025-12-04T13:24:33.7891383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7891385Z 2025-12-04T13:24:33.7891386Z 2025-12-04T13:24:33.7891464Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7891554Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7891788Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a7d0e5b822d8632d.xml - 2025-12-04T13:24:33.7891849Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7892102Z FAILED [9.3160s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7892149Z Traceback (most recent call last): 2025-12-04T13:24:33.7892311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7892354Z getattr(self, test_name)() 2025-12-04T13:24:33.7892512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7892549Z fn() 2025-12-04T13:24:33.7892698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7892739Z method(*args, **kwargs) 2025-12-04T13:24:33.7892887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7892927Z method(*args, **kwargs) 2025-12-04T13:24:33.7893074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7893111Z with policy(): 2025-12-04T13:24:33.7893262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7893303Z raise RuntimeError(msg) 2025-12-04T13:24:33.7893672Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680. 2025-12-04T13:24:33.7893677Z 2025-12-04T13:24:33.7893751Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7893988Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7893990Z 2025-12-04T13:24:33.7894075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7894138Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7894199Z ======================= 1 failed, 20 deselected in 9.47s ======================= 2025-12-04T13:24:33.7894236Z Got exit code 1 2025-12-04T13:24:33.7894285Z Retrying single test... 2025-12-04T13:24:33.7894479Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8b4a4655aafa0335.xml 2025-12-04T13:24:33.7894566Z ============================= test session starts ============================== 2025-12-04T13:24:33.7894679Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7894719Z cachedir: .pytest_cache 2025-12-04T13:24:33.7894878Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7894924Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7894965Z configfile: pytest.ini 2025-12-04T13:24:33.7895126Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7895200Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7895432Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7895477Z Running 1 items in this shard 2025-12-04T13:24:33.7895479Z 2025-12-04T13:24:33.7895792Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 13:18:53.932000 462936 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 463005 2025-12-04T13:24:33.7895945Z I1204 13:18:53.933000 462936 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 463006 2025-12-04T13:24:33.7896097Z I1204 13:18:53.934000 462936 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 463007 2025-12-04T13:24:33.7896247Z I1204 13:18:53.934000 462936 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 463008 2025-12-04T13:24:33.7896539Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7896592Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7897166Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7897204Z _warn_cpu_init() 2025-12-04T13:24:33.7897505Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7897556Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7898126Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7898164Z _warn_cpu_init() 2025-12-04T13:24:33.7898449Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7898539Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7898836Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7898925Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7899208Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7899256Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7899537Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7899584Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.7900202Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7900239Z _warn_cpu_init() 2025-12-04T13:24:33.7900804Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7900842Z _warn_cpu_init() 2025-12-04T13:24:33.7901128Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7901204Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7901486Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.7901559Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.7902852Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7902980Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7903221Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7903292Z return func(*args, **kwargs) 2025-12-04T13:24:33.7904552Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7904678Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7904903Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7904945Z return func(*args, **kwargs) 2025-12-04T13:24:33.7906206Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7906329Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7906552Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7906593Z return func(*args, **kwargs) 2025-12-04T13:24:33.7907872Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T13:24:33.7907995Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T13:24:33.7908241Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.7908281Z return func(*args, **kwargs) 2025-12-04T13:24:33.7908504Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7908544Z return func(*args, **kwargs) 2025-12-04T13:24:33.7908763Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7908803Z return func(*args, **kwargs) 2025-12-04T13:24:33.7909023Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7909064Z return func(*args, **kwargs) 2025-12-04T13:24:33.7909282Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.7909322Z return func(*args, **kwargs) 2025-12-04T13:24:33.7909611Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7909652Z return func(*args, **kwargs) 2025-12-04T13:24:33.7909840Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7910004Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7910297Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7910454Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7910740Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7910865Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7911156Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7911306Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7911582Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7911728Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7912003Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7912163Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7912456Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7912620Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7913106Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680. 2025-12-04T13:24:33.7913223Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7913420Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7913785Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7913898Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7914109Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7914273Z [rank2]:E1204 13:19:01.472000 463007 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7914313Z dist init r=2, world=4 2025-12-04T13:24:33.7914454Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7914614Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7914901Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7915053Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7915337Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7915470Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7915747Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7915894Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7916169Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7916325Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7916601Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7916758Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7917034Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7917182Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7917665Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 160256 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032. 2025-12-04T13:24:33.7917781Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7917977Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7918338Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7918451Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7918662Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7918827Z [rank3]:E1204 13:19:01.479000 463008 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7918865Z dist init r=3, world=4 2025-12-04T13:24:33.7919004Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7919164Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7919448Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7919612Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7919929Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7920054Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7920329Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7920478Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7920767Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7920938Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7921213Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7921348Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7921627Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7921774Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7922256Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776. 2025-12-04T13:24:33.7922371Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7922565Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7922927Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7923040Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7923251Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7923413Z [rank0]:E1204 13:19:01.510000 463005 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7923451Z dist init r=0, world=4 2025-12-04T13:24:33.7923588Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7923749Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7924051Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7924205Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7924487Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7924610Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7924895Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7925053Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7925339Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7925485Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7925759Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7925895Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7926177Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7926325Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7926804Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896. 2025-12-04T13:24:33.7926918Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7927114Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7927475Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7927587Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7927797Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7927959Z [rank1]:E1204 13:19:01.516000 463006 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7927998Z dist init r=1, world=4 2025-12-04T13:24:33.7928351Z [rank0]:[W1204 13:19:01.287185782 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7928391Z FAILED [9.5152s] [100%] 2025-12-04T13:24:33.7928394Z 2025-12-04T13:24:33.7928450Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7928552Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __ 2025-12-04T13:24:33.7928597Z Traceback (most recent call last): 2025-12-04T13:24:33.7928760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7928803Z self._join_processes(fn) 2025-12-04T13:24:33.7928988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7929052Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7929241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7929285Z raise RuntimeError(error) 2025-12-04T13:24:33.7929365Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7929409Z Traceback (most recent call last): 2025-12-04T13:24:33.7929569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7929611Z getattr(self, test_name)() 2025-12-04T13:24:33.7929816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7929851Z fn() 2025-12-04T13:24:33.7930003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7930045Z method(*args, **kwargs) 2025-12-04T13:24:33.7930196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7930236Z method(*args, **kwargs) 2025-12-04T13:24:33.7930386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7930422Z with policy(): 2025-12-04T13:24:33.7930575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7930615Z raise RuntimeError(msg) 2025-12-04T13:24:33.7930976Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680. 2025-12-04T13:24:33.7930980Z 2025-12-04T13:24:33.7931057Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7931293Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7931295Z 2025-12-04T13:24:33.7931382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7931384Z 2025-12-04T13:24:33.7931442Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7931487Z Traceback (most recent call last): 2025-12-04T13:24:33.7931649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7931692Z getattr(self, test_name)() 2025-12-04T13:24:33.7931849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7931885Z fn() 2025-12-04T13:24:33.7932050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7932093Z method(*args, **kwargs) 2025-12-04T13:24:33.7932240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7932281Z method(*args, **kwargs) 2025-12-04T13:24:33.7932428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7932465Z with policy(): 2025-12-04T13:24:33.7932616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7932655Z raise RuntimeError(msg) 2025-12-04T13:24:33.7933031Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 160256 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032. 2025-12-04T13:24:33.7933059Z 2025-12-04T13:24:33.7933132Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7933367Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7933369Z 2025-12-04T13:24:33.7933455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7933457Z 2025-12-04T13:24:33.7933458Z 2025-12-04T13:24:33.7933533Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7933621Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7933855Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8b4a4655aafa0335.xml - 2025-12-04T13:24:33.7933918Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7934171Z FAILED [9.5152s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7934218Z Traceback (most recent call last): 2025-12-04T13:24:33.7934380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7934422Z getattr(self, test_name)() 2025-12-04T13:24:33.7934580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7934615Z fn() 2025-12-04T13:24:33.7934766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7934807Z method(*args, **kwargs) 2025-12-04T13:24:33.7934957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7934997Z method(*args, **kwargs) 2025-12-04T13:24:33.7935147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7935183Z with policy(): 2025-12-04T13:24:33.7935338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7935379Z raise RuntimeError(msg) 2025-12-04T13:24:33.7935735Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680. 2025-12-04T13:24:33.7935739Z 2025-12-04T13:24:33.7935821Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7936057Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7936059Z 2025-12-04T13:24:33.7936144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7936146Z 2025-12-04T13:24:33.7936205Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7936249Z Traceback (most recent call last): 2025-12-04T13:24:33.7936410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7936451Z getattr(self, test_name)() 2025-12-04T13:24:33.7936620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7936663Z fn() 2025-12-04T13:24:33.7936814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7936866Z method(*args, **kwargs) 2025-12-04T13:24:33.7937016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7937054Z method(*args, **kwargs) 2025-12-04T13:24:33.7937203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7937238Z with policy(): 2025-12-04T13:24:33.7937389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7937429Z raise RuntimeError(msg) 2025-12-04T13:24:33.7937791Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 160256 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032. 2025-12-04T13:24:33.7937795Z 2025-12-04T13:24:33.7937868Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7938101Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7938103Z 2025-12-04T13:24:33.7938189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7938251Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7938313Z ======================= 1 failed, 20 deselected in 9.67s ======================= 2025-12-04T13:24:33.7938349Z Got exit code 1 2025-12-04T13:24:33.7938534Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T13:24:33.7938663Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.7938851Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e17603ce92805b8.xml 2025-12-04T13:24:33.7938909Z ============================= test session starts ============================== 2025-12-04T13:24:33.7939021Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7939062Z cachedir: .pytest_cache 2025-12-04T13:24:33.7939220Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7939267Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7939307Z configfile: pytest.ini 2025-12-04T13:24:33.7939481Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7939558Z collecting ... collected 60 items / 12 deselected / 48 selected 2025-12-04T13:24:33.7939612Z stepcurrent: skipping 12 already run items. 2025-12-04T13:24:33.7939655Z Running 9 items in this shard 2025-12-04T13:24:33.7939657Z 2025-12-04T13:24:33.7940015Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 13:19:05.818000 463338 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 463407 2025-12-04T13:24:33.7940169Z I1204 13:19:05.819000 463338 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 463408 2025-12-04T13:24:33.7940322Z I1204 13:19:05.820000 463338 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 463409 2025-12-04T13:24:33.7940487Z I1204 13:19:05.820000 463338 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 463410 2025-12-04T13:24:33.7941083Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7941140Z _warn_cpu_init() 2025-12-04T13:24:33.7941703Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7941742Z _warn_cpu_init() 2025-12-04T13:24:33.7942305Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7942342Z _warn_cpu_init() 2025-12-04T13:24:33.7942904Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7942942Z _warn_cpu_init() 2025-12-04T13:24:33.7943231Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7943273Z return func(*args, **kwargs) 2025-12-04T13:24:33.7943416Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7943578Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7943880Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7944036Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7944322Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7944447Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7944727Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7944885Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7945170Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7945328Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7945602Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7945738Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7946017Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7946165Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7946643Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.7946758Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7946957Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7947318Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7947434Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7947645Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7947808Z [rank1]:E1204 13:19:13.406000 463408 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7947847Z dist init r=1, world=4 2025-12-04T13:24:33.7947985Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7948154Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7948440Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7948593Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7948876Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7949000Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7949289Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7949458Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7949773Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7949920Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7950196Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7950333Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7950611Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7950759Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7951236Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.7951351Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7951548Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7951908Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7952021Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7952230Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7952411Z [rank2]:E1204 13:19:13.410000 463409 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7952451Z dist init r=2, world=4 2025-12-04T13:24:33.7952588Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7952746Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7953032Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7953184Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7953482Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7953631Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7953909Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7954056Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7954331Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7954478Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7954754Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7954890Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7955166Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7957635Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7958129Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7958248Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7958445Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7958807Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7958922Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7959156Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7959322Z [rank0]:E1204 13:19:13.412000 463407 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7959461Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7959619Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7959966Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7960122Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7960447Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7960571Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7960848Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7960997Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7961277Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7961425Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7961700Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7961837Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7962115Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7962264Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7962745Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7962859Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7963055Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7963430Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7963544Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7963755Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7963918Z [rank3]:E1204 13:19:13.412000 463410 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7963958Z dist init r=0, world=4 2025-12-04T13:24:33.7963996Z dist init r=3, world=4 2025-12-04T13:24:33.7964345Z [rank0]:[W1204 13:19:13.098574192 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7964395Z FAILED [9.5162s] [ 11%] 2025-12-04T13:24:33.7964408Z 2025-12-04T13:24:33.7964468Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7964568Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____ 2025-12-04T13:24:33.7964616Z Traceback (most recent call last): 2025-12-04T13:24:33.7964781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7964825Z self._join_processes(fn) 2025-12-04T13:24:33.7964998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7965054Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7965231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7965280Z raise RuntimeError(error) 2025-12-04T13:24:33.7965361Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7965408Z Traceback (most recent call last): 2025-12-04T13:24:33.7965569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7965612Z getattr(self, test_name)() 2025-12-04T13:24:33.7965769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7965806Z fn() 2025-12-04T13:24:33.7965956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7965998Z method(*args, **kwargs) 2025-12-04T13:24:33.7966149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7966191Z method(*args, **kwargs) 2025-12-04T13:24:33.7966341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7966380Z with policy(): 2025-12-04T13:24:33.7966531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7966572Z raise RuntimeError(msg) 2025-12-04T13:24:33.7966923Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7966926Z 2025-12-04T13:24:33.7967002Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7967250Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7967254Z 2025-12-04T13:24:33.7967342Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7967345Z 2025-12-04T13:24:33.7967347Z 2025-12-04T13:24:33.7967424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7967513Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7967745Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e17603ce92805b8.xml - 2025-12-04T13:24:33.7967806Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7968068Z FAILED [9.5162s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.7968126Z Traceback (most recent call last): 2025-12-04T13:24:33.7968289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7968345Z getattr(self, test_name)() 2025-12-04T13:24:33.7968503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7968539Z fn() 2025-12-04T13:24:33.7968689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7968731Z method(*args, **kwargs) 2025-12-04T13:24:33.7968880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7968920Z method(*args, **kwargs) 2025-12-04T13:24:33.7969069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7969109Z with policy(): 2025-12-04T13:24:33.7969259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7969300Z raise RuntimeError(msg) 2025-12-04T13:24:33.7969651Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7969655Z 2025-12-04T13:24:33.7969777Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7970007Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7970010Z 2025-12-04T13:24:33.7970096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7970161Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7970223Z ======================= 1 failed, 12 deselected in 9.68s ======================= 2025-12-04T13:24:33.7970261Z Got exit code 1 2025-12-04T13:24:33.7970301Z Retrying single test... 2025-12-04T13:24:33.7970491Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15d4cc4b3bbdaa80.xml 2025-12-04T13:24:33.7970548Z ============================= test session starts ============================== 2025-12-04T13:24:33.7970662Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.7970703Z cachedir: .pytest_cache 2025-12-04T13:24:33.7970863Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.7970909Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.7970968Z configfile: pytest.ini 2025-12-04T13:24:33.7971133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.7971208Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.7971434Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7971479Z Running 1 items in this shard 2025-12-04T13:24:33.7971481Z 2025-12-04T13:24:33.7971789Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 13:19:17.695000 463740 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 463809 2025-12-04T13:24:33.7971956Z I1204 13:19:17.695000 463740 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 463810 2025-12-04T13:24:33.7972122Z I1204 13:19:17.696000 463740 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 463811 2025-12-04T13:24:33.7972287Z I1204 13:19:17.697000 463740 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 463812 2025-12-04T13:24:33.7972874Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7972912Z _warn_cpu_init() 2025-12-04T13:24:33.7973480Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7973520Z _warn_cpu_init() 2025-12-04T13:24:33.7974082Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7974120Z _warn_cpu_init() 2025-12-04T13:24:33.7974682Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.7974720Z _warn_cpu_init() 2025-12-04T13:24:33.7975016Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.7975059Z return func(*args, **kwargs) 2025-12-04T13:24:33.7975204Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7975376Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7975667Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7975822Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7976108Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7976234Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7976528Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7976697Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7976972Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7977119Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7977398Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7977537Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7977816Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7977963Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7978442Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.7978558Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7978756Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7979114Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7979228Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7979440Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7979616Z [rank2]:E1204 13:19:25.257000 463811 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.7979656Z dist init r=2, world=4 2025-12-04T13:24:33.7979844Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7980003Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7980289Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7980442Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7980743Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7980893Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7981169Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7981314Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7981589Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7981736Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7982014Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7982150Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7982428Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7982576Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7983052Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.7983168Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7983363Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7983718Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7983832Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7984060Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7984227Z [rank3]:E1204 13:19:25.261000 463812 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.7984265Z dist init r=3, world=4 2025-12-04T13:24:33.7984404Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7984563Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7984859Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7985025Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7985320Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7985444Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7985720Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7985866Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7986143Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7986290Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7986567Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7986703Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7986980Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7987129Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7987607Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.7987720Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7987915Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7988281Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7988395Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7988605Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7988769Z [rank1]:E1204 13:19:25.313000 463810 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.7988809Z dist init r=1, world=4 2025-12-04T13:24:33.7988950Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.7989121Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.7989417Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7989582Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.7989910Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7990034Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.7990310Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7990457Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7990733Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7990877Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.7991153Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7991291Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.7991573Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7991720Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.7992196Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.7992311Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7992527Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7992884Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7992995Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.7993207Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7993383Z [rank0]:E1204 13:19:25.313000 463809 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.7993435Z dist init r=0, world=4 2025-12-04T13:24:33.7993776Z [rank0]:[W1204 13:19:25.069903706 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.7993828Z FAILED [9.4166s] [100%] 2025-12-04T13:24:33.7993830Z 2025-12-04T13:24:33.7993887Z =================================== FAILURES =================================== 2025-12-04T13:24:33.7993987Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____ 2025-12-04T13:24:33.7994034Z Traceback (most recent call last): 2025-12-04T13:24:33.7994196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.7994240Z self._join_processes(fn) 2025-12-04T13:24:33.7994414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.7994469Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.7994647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.7994691Z raise RuntimeError(error) 2025-12-04T13:24:33.7994771Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7994817Z Traceback (most recent call last): 2025-12-04T13:24:33.7994977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7995020Z getattr(self, test_name)() 2025-12-04T13:24:33.7995177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7995213Z fn() 2025-12-04T13:24:33.7995364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7995408Z method(*args, **kwargs) 2025-12-04T13:24:33.7995558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7995598Z method(*args, **kwargs) 2025-12-04T13:24:33.7995750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7995786Z with policy(): 2025-12-04T13:24:33.7995940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7995981Z raise RuntimeError(msg) 2025-12-04T13:24:33.7996336Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.7996350Z 2025-12-04T13:24:33.7996427Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7996656Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7996658Z 2025-12-04T13:24:33.7996745Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7996747Z 2025-12-04T13:24:33.7996749Z 2025-12-04T13:24:33.7996825Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.7996913Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.7997157Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15d4cc4b3bbdaa80.xml - 2025-12-04T13:24:33.7997220Z =========================== short test summary info ============================ 2025-12-04T13:24:33.7997488Z FAILED [9.4166s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.7997535Z Traceback (most recent call last): 2025-12-04T13:24:33.7997698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.7997741Z getattr(self, test_name)() 2025-12-04T13:24:33.7997898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.7997934Z fn() 2025-12-04T13:24:33.7998085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7998126Z method(*args, **kwargs) 2025-12-04T13:24:33.7998279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.7998320Z method(*args, **kwargs) 2025-12-04T13:24:33.7998469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.7998507Z with policy(): 2025-12-04T13:24:33.7998657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.7998698Z raise RuntimeError(msg) 2025-12-04T13:24:33.7999048Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.7999051Z 2025-12-04T13:24:33.7999125Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.7999356Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.7999360Z 2025-12-04T13:24:33.7999446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.7999508Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.7999570Z ======================= 1 failed, 20 deselected in 9.58s ======================= 2025-12-04T13:24:33.7999609Z Got exit code 1 2025-12-04T13:24:33.7999648Z Retrying single test... 2025-12-04T13:24:33.7999877Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4857c42667e0d6ff.xml 2025-12-04T13:24:33.7999933Z ============================= test session starts ============================== 2025-12-04T13:24:33.8000048Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8000103Z cachedir: .pytest_cache 2025-12-04T13:24:33.8000263Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8000308Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8000349Z configfile: pytest.ini 2025-12-04T13:24:33.8000516Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8000591Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8000814Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8000859Z Running 1 items in this shard 2025-12-04T13:24:33.8000861Z 2025-12-04T13:24:33.8001183Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 13:19:29.740000 464142 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 464211 2025-12-04T13:24:33.8001367Z I1204 13:19:29.741000 464142 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 464212 2025-12-04T13:24:33.8001519Z I1204 13:19:29.742000 464142 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 464213 2025-12-04T13:24:33.8001668Z I1204 13:19:29.742000 464142 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 464214 2025-12-04T13:24:33.8002247Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8002286Z _warn_cpu_init() 2025-12-04T13:24:33.8002857Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8002895Z _warn_cpu_init() 2025-12-04T13:24:33.8003459Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8003497Z _warn_cpu_init() 2025-12-04T13:24:33.8004058Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8004095Z _warn_cpu_init() 2025-12-04T13:24:33.8004385Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8004440Z return func(*args, **kwargs) 2025-12-04T13:24:33.8004586Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8004748Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8005040Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8005195Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8005492Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8005646Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8005920Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8006069Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8006344Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8006490Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8006767Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8006908Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8007188Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8007336Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8007816Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8007932Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8008128Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8008484Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8008599Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8008823Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8008988Z [rank2]:E1204 13:19:37.260000 464213 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8009027Z dist init r=2, world=4 2025-12-04T13:24:33.8009166Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8009326Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8009621Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8009836Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8010132Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8010255Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8010532Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8010678Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8010956Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8011103Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8011378Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8011514Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8011795Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8011943Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8012421Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8012535Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8012729Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8013101Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8013215Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8013426Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8013589Z [rank3]:E1204 13:19:37.261000 464214 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8013628Z dist init r=3, world=4 2025-12-04T13:24:33.8013765Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8013937Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8014234Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8014400Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8014688Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8014813Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8015090Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8015237Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8015515Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8015660Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8015936Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8016072Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8016351Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8016499Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8016977Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8017091Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8017296Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8017653Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8017766Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8017976Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8018150Z [rank1]:E1204 13:19:37.263000 464212 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8018199Z dist init r=1, world=4 2025-12-04T13:24:33.8018337Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8018510Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8018795Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8018947Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8019231Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8019357Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8019633Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8019834Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8020108Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8020255Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8020530Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8020667Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8020943Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8021091Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8021593Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8021708Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8021904Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8022262Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8022375Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8022599Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8022797Z [rank0]:E1204 13:19:37.271000 464211 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8022837Z dist init r=0, world=4 2025-12-04T13:24:33.8023170Z [rank0]:[W1204 13:19:37.970490045 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8023210Z FAILED [9.5151s] [100%] 2025-12-04T13:24:33.8023212Z 2025-12-04T13:24:33.8023265Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8023366Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____ 2025-12-04T13:24:33.8023413Z Traceback (most recent call last): 2025-12-04T13:24:33.8023577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8023620Z self._join_processes(fn) 2025-12-04T13:24:33.8023793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8023845Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8024024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8024067Z raise RuntimeError(error) 2025-12-04T13:24:33.8024147Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8024191Z Traceback (most recent call last): 2025-12-04T13:24:33.8024353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8024395Z getattr(self, test_name)() 2025-12-04T13:24:33.8024555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8024592Z fn() 2025-12-04T13:24:33.8024743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8024784Z method(*args, **kwargs) 2025-12-04T13:24:33.8024933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8024974Z method(*args, **kwargs) 2025-12-04T13:24:33.8025123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8025160Z with policy(): 2025-12-04T13:24:33.8025313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8025355Z raise RuntimeError(msg) 2025-12-04T13:24:33.8025719Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8025722Z 2025-12-04T13:24:33.8025799Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8026028Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8026030Z 2025-12-04T13:24:33.8026118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8026120Z 2025-12-04T13:24:33.8026179Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8026235Z Traceback (most recent call last): 2025-12-04T13:24:33.8026407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8026469Z getattr(self, test_name)() 2025-12-04T13:24:33.8026628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8026662Z fn() 2025-12-04T13:24:33.8026812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8026851Z method(*args, **kwargs) 2025-12-04T13:24:33.8027001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8027041Z method(*args, **kwargs) 2025-12-04T13:24:33.8027190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8027227Z with policy(): 2025-12-04T13:24:33.8027379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8027421Z raise RuntimeError(msg) 2025-12-04T13:24:33.8027771Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8027774Z 2025-12-04T13:24:33.8027847Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8028076Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8028079Z 2025-12-04T13:24:33.8028166Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8028168Z 2025-12-04T13:24:33.8028227Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8028272Z Traceback (most recent call last): 2025-12-04T13:24:33.8028434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8028476Z getattr(self, test_name)() 2025-12-04T13:24:33.8028633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8028668Z fn() 2025-12-04T13:24:33.8028816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8028855Z method(*args, **kwargs) 2025-12-04T13:24:33.8029003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8029042Z method(*args, **kwargs) 2025-12-04T13:24:33.8029202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8029242Z with policy(): 2025-12-04T13:24:33.8029392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8029433Z raise RuntimeError(msg) 2025-12-04T13:24:33.8029817Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8029820Z 2025-12-04T13:24:33.8029893Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8030140Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8030155Z 2025-12-04T13:24:33.8030241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8030259Z 2025-12-04T13:24:33.8030317Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8030361Z Traceback (most recent call last): 2025-12-04T13:24:33.8030522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8030563Z getattr(self, test_name)() 2025-12-04T13:24:33.8030720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8030754Z fn() 2025-12-04T13:24:33.8030903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8030942Z method(*args, **kwargs) 2025-12-04T13:24:33.8031092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8031132Z method(*args, **kwargs) 2025-12-04T13:24:33.8031282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8031318Z with policy(): 2025-12-04T13:24:33.8031468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8031508Z raise RuntimeError(msg) 2025-12-04T13:24:33.8031857Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8031859Z 2025-12-04T13:24:33.8031932Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8032160Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8032164Z 2025-12-04T13:24:33.8032250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8032252Z 2025-12-04T13:24:33.8032253Z 2025-12-04T13:24:33.8032328Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8032415Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8032644Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4857c42667e0d6ff.xml - 2025-12-04T13:24:33.8032705Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8032966Z FAILED [9.5151s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8033013Z Traceback (most recent call last): 2025-12-04T13:24:33.8033177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8033218Z getattr(self, test_name)() 2025-12-04T13:24:33.8033376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8033410Z fn() 2025-12-04T13:24:33.8033560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8033599Z method(*args, **kwargs) 2025-12-04T13:24:33.8033748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8033786Z method(*args, **kwargs) 2025-12-04T13:24:33.8033948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8034009Z with policy(): 2025-12-04T13:24:33.8034160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8034201Z raise RuntimeError(msg) 2025-12-04T13:24:33.8034555Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8034557Z 2025-12-04T13:24:33.8034630Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8034860Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8034863Z 2025-12-04T13:24:33.8034949Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8034953Z 2025-12-04T13:24:33.8035010Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8035055Z Traceback (most recent call last): 2025-12-04T13:24:33.8035215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8035257Z getattr(self, test_name)() 2025-12-04T13:24:33.8035414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8035449Z fn() 2025-12-04T13:24:33.8035597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8035637Z method(*args, **kwargs) 2025-12-04T13:24:33.8035786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8035827Z method(*args, **kwargs) 2025-12-04T13:24:33.8035976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8036013Z with policy(): 2025-12-04T13:24:33.8036163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8036205Z raise RuntimeError(msg) 2025-12-04T13:24:33.8036551Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8036555Z 2025-12-04T13:24:33.8036626Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8036867Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8036871Z 2025-12-04T13:24:33.8036957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8036959Z 2025-12-04T13:24:33.8037016Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8037061Z Traceback (most recent call last): 2025-12-04T13:24:33.8037221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8037262Z getattr(self, test_name)() 2025-12-04T13:24:33.8037421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8037454Z fn() 2025-12-04T13:24:33.8037619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8037670Z method(*args, **kwargs) 2025-12-04T13:24:33.8037819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8037871Z method(*args, **kwargs) 2025-12-04T13:24:33.8038020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8038056Z with policy(): 2025-12-04T13:24:33.8038207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8038247Z raise RuntimeError(msg) 2025-12-04T13:24:33.8038596Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8038599Z 2025-12-04T13:24:33.8038673Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8038900Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8038903Z 2025-12-04T13:24:33.8039190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8039192Z 2025-12-04T13:24:33.8039249Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8039294Z Traceback (most recent call last): 2025-12-04T13:24:33.8039456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8039497Z getattr(self, test_name)() 2025-12-04T13:24:33.8039656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8039729Z fn() 2025-12-04T13:24:33.8039879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8039921Z method(*args, **kwargs) 2025-12-04T13:24:33.8040068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8040108Z method(*args, **kwargs) 2025-12-04T13:24:33.8040256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8040293Z with policy(): 2025-12-04T13:24:33.8040443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8040483Z raise RuntimeError(msg) 2025-12-04T13:24:33.8040852Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8040856Z 2025-12-04T13:24:33.8040928Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8041156Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8041158Z 2025-12-04T13:24:33.8041243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8041306Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8041368Z ======================= 1 failed, 20 deselected in 9.68s ======================= 2025-12-04T13:24:33.8041406Z Got exit code 1 2025-12-04T13:24:33.8041600Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T13:24:33.8041751Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8041957Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d125d973bd61e419.xml 2025-12-04T13:24:33.8042015Z ============================= test session starts ============================== 2025-12-04T13:24:33.8042128Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8042169Z cachedir: .pytest_cache 2025-12-04T13:24:33.8042327Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8042373Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8042414Z configfile: pytest.ini 2025-12-04T13:24:33.8042577Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8042654Z collecting ... collected 60 items / 13 deselected / 47 selected 2025-12-04T13:24:33.8042708Z stepcurrent: skipping 13 already run items. 2025-12-04T13:24:33.8042758Z Running 8 items in this shard 2025-12-04T13:24:33.8042760Z 2025-12-04T13:24:33.8043078Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 13:19:41.574000 464544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 464613 2025-12-04T13:24:33.8043232Z I1204 13:19:41.575000 464544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 464614 2025-12-04T13:24:33.8043382Z I1204 13:19:41.575000 464544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 464615 2025-12-04T13:24:33.8043533Z I1204 13:19:41.576000 464544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 464616 2025-12-04T13:24:33.8044114Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8044152Z _warn_cpu_init() 2025-12-04T13:24:33.8044719Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8044769Z _warn_cpu_init() 2025-12-04T13:24:33.8045335Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8045380Z _warn_cpu_init() 2025-12-04T13:24:33.8045954Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8046013Z _warn_cpu_init() 2025-12-04T13:24:33.8046303Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8046348Z return func(*args, **kwargs) 2025-12-04T13:24:33.8046491Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8046656Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8046944Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8047102Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8047388Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8047512Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8047790Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8047939Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8048218Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8048365Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8048643Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8048781Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8049059Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8049222Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8049768Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8049885Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8050079Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8050465Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8052203Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8052415Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8052581Z [rank0]:E1204 13:19:49.130000 464613 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8052621Z dist init r=0, world=4 2025-12-04T13:24:33.8052760Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8052923Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8053213Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8053378Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8053664Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8053787Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8054066Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8054217Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8054494Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8054641Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8054918Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8055077Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8055356Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8055505Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8055993Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8056118Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8056327Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8056770Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8056886Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8057096Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8057262Z [rank1]:E1204 13:19:49.134000 464614 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8057304Z dist init r=1, world=4 2025-12-04T13:24:33.8057440Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8057601Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8057889Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8058044Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8058330Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8058457Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8058732Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8058881Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8059157Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8059303Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8059593Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8059774Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8060060Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8060207Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8060711Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8060863Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8061058Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8061428Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8061543Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8061757Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8061922Z [rank3]:E1204 13:19:49.140000 464616 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8061964Z dist init r=3, world=4 2025-12-04T13:24:33.8062101Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8062260Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8062547Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8062701Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8062986Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8063110Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8063389Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8063536Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8063829Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8063976Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8064253Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8064388Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8064675Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8064824Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8065320Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8065445Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8065642Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8066011Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8066126Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8066338Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8066503Z [rank2]:E1204 13:19:49.197000 464615 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8066542Z dist init r=2, world=4 2025-12-04T13:24:33.8066883Z [rank0]:[W1204 13:19:49.796541016 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8066924Z FAILED [9.5159s] [ 12%] 2025-12-04T13:24:33.8066927Z 2025-12-04T13:24:33.8066984Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8067095Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.8067142Z Traceback (most recent call last): 2025-12-04T13:24:33.8067307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8067351Z self._join_processes(fn) 2025-12-04T13:24:33.8067523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8067576Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8067755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8067809Z raise RuntimeError(error) 2025-12-04T13:24:33.8067893Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8067938Z Traceback (most recent call last): 2025-12-04T13:24:33.8068102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8068144Z getattr(self, test_name)() 2025-12-04T13:24:33.8068302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8068338Z fn() 2025-12-04T13:24:33.8068490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8068530Z method(*args, **kwargs) 2025-12-04T13:24:33.8068692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8068734Z method(*args, **kwargs) 2025-12-04T13:24:33.8068903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8068942Z with policy(): 2025-12-04T13:24:33.8069107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8069149Z raise RuntimeError(msg) 2025-12-04T13:24:33.8069512Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8069514Z 2025-12-04T13:24:33.8069589Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8069875Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8069878Z 2025-12-04T13:24:33.8069966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8069969Z 2025-12-04T13:24:33.8070027Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8070074Z Traceback (most recent call last): 2025-12-04T13:24:33.8070235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8070278Z getattr(self, test_name)() 2025-12-04T13:24:33.8070435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8070471Z fn() 2025-12-04T13:24:33.8070623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8070663Z method(*args, **kwargs) 2025-12-04T13:24:33.8070815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8070855Z method(*args, **kwargs) 2025-12-04T13:24:33.8071005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8071042Z with policy(): 2025-12-04T13:24:33.8071193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8071233Z raise RuntimeError(msg) 2025-12-04T13:24:33.8071595Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8071598Z 2025-12-04T13:24:33.8071686Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8071929Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8071932Z 2025-12-04T13:24:33.8072017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8072021Z 2025-12-04T13:24:33.8072078Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8072123Z Traceback (most recent call last): 2025-12-04T13:24:33.8072283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8072327Z getattr(self, test_name)() 2025-12-04T13:24:33.8072497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8072533Z fn() 2025-12-04T13:24:33.8072683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8072739Z method(*args, **kwargs) 2025-12-04T13:24:33.8072887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8072942Z method(*args, **kwargs) 2025-12-04T13:24:33.8073090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8073128Z with policy(): 2025-12-04T13:24:33.8073282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8073323Z raise RuntimeError(msg) 2025-12-04T13:24:33.8073683Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8073687Z 2025-12-04T13:24:33.8073760Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8074001Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8074004Z 2025-12-04T13:24:33.8074090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8074093Z 2025-12-04T13:24:33.8074094Z 2025-12-04T13:24:33.8074171Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8074258Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8074493Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d125d973bd61e419.xml - 2025-12-04T13:24:33.8074555Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8074815Z FAILED [9.5159s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8074862Z Traceback (most recent call last): 2025-12-04T13:24:33.8075025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8075067Z getattr(self, test_name)() 2025-12-04T13:24:33.8075227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8075260Z fn() 2025-12-04T13:24:33.8075412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8075462Z method(*args, **kwargs) 2025-12-04T13:24:33.8075613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8075655Z method(*args, **kwargs) 2025-12-04T13:24:33.8075805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8075843Z with policy(): 2025-12-04T13:24:33.8075994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8076036Z raise RuntimeError(msg) 2025-12-04T13:24:33.8076406Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8076408Z 2025-12-04T13:24:33.8076483Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8076736Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8076750Z 2025-12-04T13:24:33.8076920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8076923Z 2025-12-04T13:24:33.8076980Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8077026Z Traceback (most recent call last): 2025-12-04T13:24:33.8077187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8077230Z getattr(self, test_name)() 2025-12-04T13:24:33.8077389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8077424Z fn() 2025-12-04T13:24:33.8077574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8077614Z method(*args, **kwargs) 2025-12-04T13:24:33.8077765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8077804Z method(*args, **kwargs) 2025-12-04T13:24:33.8077953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8077990Z with policy(): 2025-12-04T13:24:33.8078140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8078180Z raise RuntimeError(msg) 2025-12-04T13:24:33.8078542Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8078545Z 2025-12-04T13:24:33.8078617Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8078858Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8078860Z 2025-12-04T13:24:33.8078945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8078949Z 2025-12-04T13:24:33.8079006Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8079052Z Traceback (most recent call last): 2025-12-04T13:24:33.8079213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8079256Z getattr(self, test_name)() 2025-12-04T13:24:33.8079425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8079462Z fn() 2025-12-04T13:24:33.8079610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8079652Z method(*args, **kwargs) 2025-12-04T13:24:33.8079839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8079879Z method(*args, **kwargs) 2025-12-04T13:24:33.8080026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8080063Z with policy(): 2025-12-04T13:24:33.8080228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8080270Z raise RuntimeError(msg) 2025-12-04T13:24:33.8080630Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8080662Z 2025-12-04T13:24:33.8080735Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8080975Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8080978Z 2025-12-04T13:24:33.8081062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8081125Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8081190Z ======================= 1 failed, 13 deselected in 9.68s ======================= 2025-12-04T13:24:33.8081230Z Got exit code 1 2025-12-04T13:24:33.8081270Z Retrying single test... 2025-12-04T13:24:33.8081458Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5f100a3682b1765d.xml 2025-12-04T13:24:33.8081516Z ============================= test session starts ============================== 2025-12-04T13:24:33.8081630Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8081671Z cachedir: .pytest_cache 2025-12-04T13:24:33.8081828Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8081874Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8081915Z configfile: pytest.ini 2025-12-04T13:24:33.8082079Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8082156Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8082393Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8082439Z Running 1 items in this shard 2025-12-04T13:24:33.8082441Z 2025-12-04T13:24:33.8082757Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 13:19:53.628000 464946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 465015 2025-12-04T13:24:33.8082913Z I1204 13:19:53.628000 464946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 465016 2025-12-04T13:24:33.8083067Z I1204 13:19:53.629000 464946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 465017 2025-12-04T13:24:33.8083230Z I1204 13:19:53.630000 464946 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 465018 2025-12-04T13:24:33.8083809Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8083848Z _warn_cpu_init() 2025-12-04T13:24:33.8084431Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8084481Z _warn_cpu_init() 2025-12-04T13:24:33.8085040Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8085090Z _warn_cpu_init() 2025-12-04T13:24:33.8085657Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8085698Z _warn_cpu_init() 2025-12-04T13:24:33.8085988Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8086031Z return func(*args, **kwargs) 2025-12-04T13:24:33.8086176Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8086337Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8086628Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8086783Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8087069Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8087193Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8087474Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8087634Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8087914Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8088061Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8088336Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8088473Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8088762Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8088920Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8089422Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8089538Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8089780Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8090153Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8090268Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8090480Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8090644Z [rank0]:E1204 13:20:01.131000 465015 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8090684Z dist init r=0, world=4 2025-12-04T13:24:33.8090822Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8090985Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8091272Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8091428Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8091712Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8091839Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8092130Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8092280Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8092561Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8092707Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8092996Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8093146Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8093438Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8093584Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8094072Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8094188Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8094383Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8094757Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8094873Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8095086Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8095250Z [rank1]:E1204 13:20:01.133000 465016 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8095291Z dist init r=1, world=4 2025-12-04T13:24:33.8095435Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8095595Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8095882Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8096035Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8096329Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8096454Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8096732Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8096878Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8097164Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8097312Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8097599Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8097747Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8098023Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8098176Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8098664Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8098781Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8098975Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8099343Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8099457Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8099668Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8099876Z [rank2]:E1204 13:20:01.186000 465017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8099915Z dist init r=2, world=4 2025-12-04T13:24:33.8100053Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8100211Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8100521Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8100677Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8100961Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8101085Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8101377Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8101526Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8101814Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8101976Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8102252Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8102388Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8102667Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8102815Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8103302Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8103416Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8103610Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8103978Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8104092Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8104302Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8104465Z [rank3]:E1204 13:20:01.187000 465018 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8104505Z dist init r=3, world=4 2025-12-04T13:24:33.8104851Z [rank0]:[W1204 13:20:01.810490110 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8104893Z FAILED [9.3169s] [100%] 2025-12-04T13:24:33.8104896Z 2025-12-04T13:24:33.8104952Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8105061Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.8105109Z Traceback (most recent call last): 2025-12-04T13:24:33.8105270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8105314Z self._join_processes(fn) 2025-12-04T13:24:33.8105486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8105551Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8105730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8105786Z raise RuntimeError(error) 2025-12-04T13:24:33.8105866Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8105923Z Traceback (most recent call last): 2025-12-04T13:24:33.8106082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8106125Z getattr(self, test_name)() 2025-12-04T13:24:33.8106282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8106319Z fn() 2025-12-04T13:24:33.8106472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8106514Z method(*args, **kwargs) 2025-12-04T13:24:33.8106664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8106707Z method(*args, **kwargs) 2025-12-04T13:24:33.8106856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8106896Z with policy(): 2025-12-04T13:24:33.8107046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8107088Z raise RuntimeError(msg) 2025-12-04T13:24:33.8107450Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8107453Z 2025-12-04T13:24:33.8107529Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8107771Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8107775Z 2025-12-04T13:24:33.8107863Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8107865Z 2025-12-04T13:24:33.8107867Z 2025-12-04T13:24:33.8107942Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8108030Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8108261Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5f100a3682b1765d.xml - 2025-12-04T13:24:33.8108324Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8108595Z FAILED [9.3169s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8108644Z Traceback (most recent call last): 2025-12-04T13:24:33.8108807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8108850Z getattr(self, test_name)() 2025-12-04T13:24:33.8109007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8109043Z fn() 2025-12-04T13:24:33.8109192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8109234Z method(*args, **kwargs) 2025-12-04T13:24:33.8109392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8109434Z method(*args, **kwargs) 2025-12-04T13:24:33.8109594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8109632Z with policy(): 2025-12-04T13:24:33.8109825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8109866Z raise RuntimeError(msg) 2025-12-04T13:24:33.8110224Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8110227Z 2025-12-04T13:24:33.8110301Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8110545Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8110548Z 2025-12-04T13:24:33.8110635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8110698Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8110759Z ======================= 1 failed, 20 deselected in 9.46s ======================= 2025-12-04T13:24:33.8110797Z Got exit code 1 2025-12-04T13:24:33.8110837Z Retrying single test... 2025-12-04T13:24:33.8111027Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e015ef1a1cd9270.xml 2025-12-04T13:24:33.8111084Z ============================= test session starts ============================== 2025-12-04T13:24:33.8111198Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8111239Z cachedir: .pytest_cache 2025-12-04T13:24:33.8111398Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8111445Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8111485Z configfile: pytest.ini 2025-12-04T13:24:33.8111648Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8111722Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8111959Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8112004Z Running 1 items in this shard 2025-12-04T13:24:33.8112007Z 2025-12-04T13:24:33.8112344Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 13:20:05.620000 465348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 465417 2025-12-04T13:24:33.8112500Z I1204 13:20:05.621000 465348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 465418 2025-12-04T13:24:33.8112654Z I1204 13:20:05.621000 465348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 465419 2025-12-04T13:24:33.8112804Z I1204 13:20:05.622000 465348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 465420 2025-12-04T13:24:33.8113393Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8113431Z _warn_cpu_init() 2025-12-04T13:24:33.8114018Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8114078Z _warn_cpu_init() 2025-12-04T13:24:33.8114639Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8114677Z _warn_cpu_init() 2025-12-04T13:24:33.8115236Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8115274Z _warn_cpu_init() 2025-12-04T13:24:33.8115565Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8115608Z return func(*args, **kwargs) 2025-12-04T13:24:33.8115753Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8115915Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8116209Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8116363Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8116648Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8116784Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8117063Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8117212Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8117487Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8117645Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8117920Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8118068Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8118357Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8118505Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8118997Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8119113Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8119309Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8119679Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8119839Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8120051Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8120216Z [rank2]:E1204 13:20:13.201000 465419 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8120257Z dist init r=2, world=4 2025-12-04T13:24:33.8120394Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8120553Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8120840Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8121012Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8121297Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8121423Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8121698Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8121846Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8122136Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8122294Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8122590Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8122724Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8123001Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8123150Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8123640Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208. 2025-12-04T13:24:33.8123756Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8123950Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8124320Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8124435Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8124646Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8124811Z [rank3]:E1204 13:20:13.243000 465420 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8124850Z dist init r=3, world=4 2025-12-04T13:24:33.8124987Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8125146Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8125442Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8125597Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8125881Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8126004Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8126296Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8126444Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8126730Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8126889Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8127164Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8127300Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8127578Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8127727Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8128212Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952. 2025-12-04T13:24:33.8128326Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8128525Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8128893Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8129009Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8129219Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8129383Z [rank0]:E1204 13:20:13.255000 465417 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8129421Z dist init r=0, world=4 2025-12-04T13:24:33.8129568Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8129761Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8130047Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8130202Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8130499Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8130625Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8130920Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8131079Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8131356Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8131502Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8131779Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8131915Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8132193Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8132340Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8132826Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072. 2025-12-04T13:24:33.8132941Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8133135Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8133502Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8133615Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8133839Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8134004Z [rank1]:E1204 13:20:13.258000 465418 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8134044Z dist init r=1, world=4 2025-12-04T13:24:33.8134381Z [rank0]:[W1204 13:20:13.020943129 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8134420Z FAILED [9.4188s] [100%] 2025-12-04T13:24:33.8134422Z 2025-12-04T13:24:33.8134477Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8134586Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _ 2025-12-04T13:24:33.8134641Z Traceback (most recent call last): 2025-12-04T13:24:33.8134804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8134858Z self._join_processes(fn) 2025-12-04T13:24:33.8135030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8135099Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8135277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8135321Z raise RuntimeError(error) 2025-12-04T13:24:33.8135400Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8135445Z Traceback (most recent call last): 2025-12-04T13:24:33.8135606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8135649Z getattr(self, test_name)() 2025-12-04T13:24:33.8135807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8135844Z fn() 2025-12-04T13:24:33.8135994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8136037Z method(*args, **kwargs) 2025-12-04T13:24:33.8136187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8136228Z method(*args, **kwargs) 2025-12-04T13:24:33.8136377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8136415Z with policy(): 2025-12-04T13:24:33.8136567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8136608Z raise RuntimeError(msg) 2025-12-04T13:24:33.8136973Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8136977Z 2025-12-04T13:24:33.8137051Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8137295Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8137298Z 2025-12-04T13:24:33.8137384Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8137387Z 2025-12-04T13:24:33.8137388Z 2025-12-04T13:24:33.8137464Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8137563Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8137797Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e015ef1a1cd9270.xml - 2025-12-04T13:24:33.8137860Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8138117Z FAILED [9.4188s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8138164Z Traceback (most recent call last): 2025-12-04T13:24:33.8138326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8138370Z getattr(self, test_name)() 2025-12-04T13:24:33.8138539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8138575Z fn() 2025-12-04T13:24:33.8138735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8138777Z method(*args, **kwargs) 2025-12-04T13:24:33.8138939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8138979Z method(*args, **kwargs) 2025-12-04T13:24:33.8139127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8139164Z with policy(): 2025-12-04T13:24:33.8139315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8139357Z raise RuntimeError(msg) 2025-12-04T13:24:33.8139759Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856. 2025-12-04T13:24:33.8139764Z 2025-12-04T13:24:33.8139839Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8140081Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8140083Z 2025-12-04T13:24:33.8140168Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8140231Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8140291Z ======================= 1 failed, 20 deselected in 9.57s ======================= 2025-12-04T13:24:33.8140330Z Got exit code 1 2025-12-04T13:24:33.8140520Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda 2025-12-04T13:24:33.8140650Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8140836Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-066bb6d612eba5d1.xml 2025-12-04T13:24:33.8140894Z ============================= test session starts ============================== 2025-12-04T13:24:33.8141006Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8141048Z cachedir: .pytest_cache 2025-12-04T13:24:33.8141204Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8141251Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8141292Z configfile: pytest.ini 2025-12-04T13:24:33.8141469Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8141544Z collecting ... collected 60 items / 14 deselected / 46 selected 2025-12-04T13:24:33.8141599Z stepcurrent: skipping 14 already run items. 2025-12-04T13:24:33.8141644Z Running 7 items in this shard 2025-12-04T13:24:33.8141647Z 2025-12-04T13:24:33.8141957Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 13:20:17.680000 465750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 465819 2025-12-04T13:24:33.8142111Z I1204 13:20:17.681000 465750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 465820 2025-12-04T13:24:33.8142276Z I1204 13:20:17.682000 465750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 465821 2025-12-04T13:24:33.8142427Z I1204 13:20:17.682000 465750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 465822 2025-12-04T13:24:33.8142734Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8142802Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8143087Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8143137Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8143710Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8143749Z _warn_cpu_init() 2025-12-04T13:24:33.8144316Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8144353Z _warn_cpu_init() 2025-12-04T13:24:33.8144642Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8144721Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8145007Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8145084Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8145372Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8145415Z return func(*args, **kwargs) 2025-12-04T13:24:33.8145712Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8145762Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8146329Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8146369Z _warn_cpu_init() 2025-12-04T13:24:33.8146654Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8146718Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8147284Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8147344Z _warn_cpu_init() 2025-12-04T13:24:33.8147629Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8147702Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8147988Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8148062Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8148293Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8148338Z return func(*args, **kwargs) 2025-12-04T13:24:33.8148562Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8148605Z return func(*args, **kwargs) 2025-12-04T13:24:33.8148829Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8148874Z return func(*args, **kwargs) 2025-12-04T13:24:33.8149095Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8149137Z return func(*args, **kwargs) 2025-12-04T13:24:33.8149356Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8149398Z return func(*args, **kwargs) 2025-12-04T13:24:33.8149616Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8149657Z return func(*args, **kwargs) 2025-12-04T13:24:33.8149911Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8149966Z return func(*args, **kwargs) 2025-12-04T13:24:33.8150186Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8150227Z return func(*args, **kwargs) 2025-12-04T13:24:33.8150372Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8150534Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8150823Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8150996Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8151286Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8151435Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8151713Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8151862Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8152139Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8152288Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8152561Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8152699Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8152976Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8153125Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8153613Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104. 2025-12-04T13:24:33.8153730Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8153926Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8154300Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8154415Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8154626Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8154791Z [rank0]:E1204 13:20:25.204000 465819 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8154830Z dist init r=0, world=4 2025-12-04T13:24:33.8154969Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8155129Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8155425Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8155589Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8155886Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8156011Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8156287Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8156435Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8156712Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8156859Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8157134Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8157270Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8157549Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8157696Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8158179Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360. 2025-12-04T13:24:33.8158295Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8158499Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8158862Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8158976Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8159187Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8159350Z [rank3]:E1204 13:20:25.205000 465822 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8159399Z dist init r=3, world=4 2025-12-04T13:24:33.8159538Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8159737Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8160041Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8160193Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8160476Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8160600Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8160877Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8161026Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8161304Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8161450Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8161726Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8161864Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8162141Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8162288Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8162790Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008. 2025-12-04T13:24:33.8162907Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8163101Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8163463Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8163579Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8163802Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8163967Z [rank2]:E1204 13:20:25.206000 465821 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8164020Z dist init r=2, world=4 2025-12-04T13:24:33.8164170Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8164328Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8164613Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8164767Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8165050Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8165175Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8165452Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8165599Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8165876Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8166023Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8166303Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8166438Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8166715Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8166862Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8167351Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224. 2025-12-04T13:24:33.8167467Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8167662Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8168033Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8168149Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8168372Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8168546Z [rank1]:E1204 13:20:25.209000 465820 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8168585Z dist init r=1, world=4 2025-12-04T13:24:33.8168919Z [rank0]:[W1204 13:20:25.884827584 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8168959Z FAILED [9.3160s] [ 14%] 2025-12-04T13:24:33.8168962Z 2025-12-04T13:24:33.8169018Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8169120Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __ 2025-12-04T13:24:33.8169168Z Traceback (most recent call last): 2025-12-04T13:24:33.8169330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8169375Z self._join_processes(fn) 2025-12-04T13:24:33.8169547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8169602Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8169812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8169857Z raise RuntimeError(error) 2025-12-04T13:24:33.8169938Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8169985Z Traceback (most recent call last): 2025-12-04T13:24:33.8170146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8170190Z getattr(self, test_name)() 2025-12-04T13:24:33.8170348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8170384Z fn() 2025-12-04T13:24:33.8170534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8170575Z method(*args, **kwargs) 2025-12-04T13:24:33.8170724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8170765Z method(*args, **kwargs) 2025-12-04T13:24:33.8170915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8170970Z with policy(): 2025-12-04T13:24:33.8171122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8171165Z raise RuntimeError(msg) 2025-12-04T13:24:33.8171520Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360. 2025-12-04T13:24:33.8171524Z 2025-12-04T13:24:33.8171599Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8171835Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8171851Z 2025-12-04T13:24:33.8171939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8171942Z 2025-12-04T13:24:33.8171958Z 2025-12-04T13:24:33.8172033Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8172121Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8172369Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-066bb6d612eba5d1.xml - 2025-12-04T13:24:33.8172430Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8172682Z FAILED [9.3160s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8172730Z Traceback (most recent call last): 2025-12-04T13:24:33.8172896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8172940Z getattr(self, test_name)() 2025-12-04T13:24:33.8173098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8173134Z fn() 2025-12-04T13:24:33.8173284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8173325Z method(*args, **kwargs) 2025-12-04T13:24:33.8173475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8173515Z method(*args, **kwargs) 2025-12-04T13:24:33.8173663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8173700Z with policy(): 2025-12-04T13:24:33.8173852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8173893Z raise RuntimeError(msg) 2025-12-04T13:24:33.8174251Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360. 2025-12-04T13:24:33.8174254Z 2025-12-04T13:24:33.8174328Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8174562Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8174565Z 2025-12-04T13:24:33.8174651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8174715Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8174789Z ======================= 1 failed, 14 deselected in 9.48s ======================= 2025-12-04T13:24:33.8174830Z Got exit code 1 2025-12-04T13:24:33.8174870Z Retrying single test... 2025-12-04T13:24:33.8175059Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d1bb92e7b3074b9.xml 2025-12-04T13:24:33.8175116Z ============================= test session starts ============================== 2025-12-04T13:24:33.8175228Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8175269Z cachedir: .pytest_cache 2025-12-04T13:24:33.8175427Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8175698Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8178023Z configfile: pytest.ini 2025-12-04T13:24:33.8178228Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8178317Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8178550Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8178607Z Running 1 items in this shard 2025-12-04T13:24:33.8178610Z 2025-12-04T13:24:33.8178923Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 13:20:29.509000 466152 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 466221 2025-12-04T13:24:33.8179078Z I1204 13:20:29.510000 466152 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 466222 2025-12-04T13:24:33.8179232Z I1204 13:20:29.511000 466152 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 466223 2025-12-04T13:24:33.8179382Z I1204 13:20:29.511000 466152 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 466224 2025-12-04T13:24:33.8179684Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8179786Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8180367Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8180408Z _warn_cpu_init() 2025-12-04T13:24:33.8180694Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8180748Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8181029Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8181078Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8181667Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8181707Z _warn_cpu_init() 2025-12-04T13:24:33.8182269Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8182306Z _warn_cpu_init() 2025-12-04T13:24:33.8182606Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8182687Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8182988Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8183081Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8183365Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8183439Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8183722Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8183773Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8184062Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8184108Z return func(*args, **kwargs) 2025-12-04T13:24:33.8184677Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8184716Z _warn_cpu_init() 2025-12-04T13:24:33.8185002Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8185076Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8185306Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8185349Z return func(*args, **kwargs) 2025-12-04T13:24:33.8185572Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8185613Z return func(*args, **kwargs) 2025-12-04T13:24:33.8185834Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8185886Z return func(*args, **kwargs) 2025-12-04T13:24:33.8186107Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8186149Z return func(*args, **kwargs) 2025-12-04T13:24:33.8186367Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8186409Z return func(*args, **kwargs) 2025-12-04T13:24:33.8186626Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8186667Z return func(*args, **kwargs) 2025-12-04T13:24:33.8186898Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8186950Z return func(*args, **kwargs) 2025-12-04T13:24:33.8187168Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8187221Z return func(*args, **kwargs) 2025-12-04T13:24:33.8187367Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8187531Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8187821Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8187980Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8188264Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8188392Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8188671Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8188820Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8189101Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8189250Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8189526Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8189662Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8189983Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8190145Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8190632Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224. 2025-12-04T13:24:33.8190751Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8190947Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8191329Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8191463Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8191699Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8191865Z [rank1]:E1204 13:20:37.178000 466222 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8191906Z dist init r=1, world=4 2025-12-04T13:24:33.8192045Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8192206Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8192495Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8192651Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8192936Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8193062Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8193338Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8193486Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8193761Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8193908Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8194184Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8194321Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8194610Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8194759Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8195241Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008. 2025-12-04T13:24:33.8195367Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8195564Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8195937Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8196064Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8196276Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8196439Z [rank2]:E1204 13:20:37.185000 466223 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8196479Z dist init r=2, world=4 2025-12-04T13:24:33.8196618Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8196778Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8197063Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8197218Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8197504Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8197631Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8197908Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8198056Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8198331Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8198477Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8198764Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8198901Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8199179Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8199328Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8199864Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360. 2025-12-04T13:24:33.8199992Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8200200Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8200561Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8200675Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8200889Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8201055Z [rank3]:E1204 13:20:37.199000 466224 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8201094Z dist init r=3, world=4 2025-12-04T13:24:33.8201232Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8201391Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8201676Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8201830Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8202116Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8202240Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8202519Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8202666Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8202956Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8203104Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8203381Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8203516Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8203792Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8203952Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8204439Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104. 2025-12-04T13:24:33.8204563Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8204760Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8205124Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8205239Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8205449Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8205614Z [rank0]:E1204 13:20:37.236000 466221 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8205653Z dist init r=0, world=4 2025-12-04T13:24:33.8205989Z [rank0]:[W1204 13:20:37.004978346 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8206029Z FAILED [9.6163s] [100%] 2025-12-04T13:24:33.8206031Z 2025-12-04T13:24:33.8206088Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8206191Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __ 2025-12-04T13:24:33.8206238Z Traceback (most recent call last): 2025-12-04T13:24:33.8206401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8206445Z self._join_processes(fn) 2025-12-04T13:24:33.8206618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8206671Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8206849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8206893Z raise RuntimeError(error) 2025-12-04T13:24:33.8206986Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8207032Z Traceback (most recent call last): 2025-12-04T13:24:33.8207193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8207236Z getattr(self, test_name)() 2025-12-04T13:24:33.8207394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8207429Z fn() 2025-12-04T13:24:33.8207581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8207622Z method(*args, **kwargs) 2025-12-04T13:24:33.8207772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8207824Z method(*args, **kwargs) 2025-12-04T13:24:33.8207975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8208030Z with policy(): 2025-12-04T13:24:33.8208182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8208238Z raise RuntimeError(msg) 2025-12-04T13:24:33.8208592Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224. 2025-12-04T13:24:33.8208595Z 2025-12-04T13:24:33.8208672Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8208910Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8208912Z 2025-12-04T13:24:33.8209000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8209003Z 2025-12-04T13:24:33.8209062Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8209112Z Traceback (most recent call last): 2025-12-04T13:24:33.8209272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8209316Z getattr(self, test_name)() 2025-12-04T13:24:33.8209475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8209510Z fn() 2025-12-04T13:24:33.8209661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8209766Z method(*args, **kwargs) 2025-12-04T13:24:33.8209919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8209960Z method(*args, **kwargs) 2025-12-04T13:24:33.8210110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8210150Z with policy(): 2025-12-04T13:24:33.8210301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8210342Z raise RuntimeError(msg) 2025-12-04T13:24:33.8210698Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360. 2025-12-04T13:24:33.8210700Z 2025-12-04T13:24:33.8210775Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8211026Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8211029Z 2025-12-04T13:24:33.8211115Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8211119Z 2025-12-04T13:24:33.8211121Z 2025-12-04T13:24:33.8211197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8211286Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8211519Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d1bb92e7b3074b9.xml - 2025-12-04T13:24:33.8211580Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8211850Z FAILED [9.6163s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8211910Z Traceback (most recent call last): 2025-12-04T13:24:33.8212071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8212129Z getattr(self, test_name)() 2025-12-04T13:24:33.8212287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8212322Z fn() 2025-12-04T13:24:33.8212472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8212512Z method(*args, **kwargs) 2025-12-04T13:24:33.8212660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8212703Z method(*args, **kwargs) 2025-12-04T13:24:33.8212853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8212891Z with policy(): 2025-12-04T13:24:33.8213043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8213085Z raise RuntimeError(msg) 2025-12-04T13:24:33.8213440Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224. 2025-12-04T13:24:33.8213442Z 2025-12-04T13:24:33.8213515Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8213749Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8213753Z 2025-12-04T13:24:33.8213838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8213841Z 2025-12-04T13:24:33.8213900Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8213946Z Traceback (most recent call last): 2025-12-04T13:24:33.8214108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8214150Z getattr(self, test_name)() 2025-12-04T13:24:33.8214309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8214343Z fn() 2025-12-04T13:24:33.8214494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8214534Z method(*args, **kwargs) 2025-12-04T13:24:33.8214695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8214737Z method(*args, **kwargs) 2025-12-04T13:24:33.8214889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8214927Z with policy(): 2025-12-04T13:24:33.8215078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8215120Z raise RuntimeError(msg) 2025-12-04T13:24:33.8215471Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360. 2025-12-04T13:24:33.8215473Z 2025-12-04T13:24:33.8215559Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8215793Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8215809Z 2025-12-04T13:24:33.8215896Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8215969Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8216032Z ======================= 1 failed, 20 deselected in 9.77s ======================= 2025-12-04T13:24:33.8216069Z Got exit code 1 2025-12-04T13:24:33.8216110Z Retrying single test... 2025-12-04T13:24:33.8216299Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a8861b396c615a20.xml 2025-12-04T13:24:33.8216358Z ============================= test session starts ============================== 2025-12-04T13:24:33.8216473Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8216515Z cachedir: .pytest_cache 2025-12-04T13:24:33.8216675Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8216723Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8216765Z configfile: pytest.ini 2025-12-04T13:24:33.8216927Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8217002Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8217230Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8217275Z Running 1 items in this shard 2025-12-04T13:24:33.8217277Z 2025-12-04T13:24:33.8217588Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 13:20:41.595000 466554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 466623 2025-12-04T13:24:33.8217743Z I1204 13:20:41.596000 466554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 466624 2025-12-04T13:24:33.8217895Z I1204 13:20:41.597000 466554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 466625 2025-12-04T13:24:33.8218044Z I1204 13:20:41.598000 466554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 466626 2025-12-04T13:24:33.8218335Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8218385Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8218686Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8218736Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8219019Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8219068Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8219657Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8219738Z _warn_cpu_init() 2025-12-04T13:24:33.8220325Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8220377Z _warn_cpu_init() 2025-12-04T13:24:33.8220939Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8220978Z _warn_cpu_init() 2025-12-04T13:24:33.8221262Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8221313Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T13:24:33.8221874Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8221910Z _warn_cpu_init() 2025-12-04T13:24:33.8222197Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8222277Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8222560Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8222635Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8222919Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8223006Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8223292Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T13:24:33.8223366Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T13:24:33.8223656Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8223701Z return func(*args, **kwargs) 2025-12-04T13:24:33.8223929Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8223992Z return func(*args, **kwargs) 2025-12-04T13:24:33.8224216Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8224270Z return func(*args, **kwargs) 2025-12-04T13:24:33.8224501Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8224543Z return func(*args, **kwargs) 2025-12-04T13:24:33.8224762Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T13:24:33.8224804Z return func(*args, **kwargs) 2025-12-04T13:24:33.8225024Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8225065Z return func(*args, **kwargs) 2025-12-04T13:24:33.8225284Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8225324Z return func(*args, **kwargs) 2025-12-04T13:24:33.8225543Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8225583Z return func(*args, **kwargs) 2025-12-04T13:24:33.8225802Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T13:24:33.8225842Z return func(*args, **kwargs) 2025-12-04T13:24:33.8225989Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8226154Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8226444Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8226600Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8226887Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8227013Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8227301Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8227451Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8227728Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8227876Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8228168Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8228306Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8228596Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8228753Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8229237Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008. 2025-12-04T13:24:33.8229354Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8229551Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8229960Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8230075Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8230289Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8230454Z [rank2]:E1204 13:20:49.410000 466625 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8230496Z dist init r=2, world=4 2025-12-04T13:24:33.8230633Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8230795Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8231082Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8231235Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8231533Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8231771Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8232049Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8232195Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8232470Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8232635Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8232926Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8233074Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8233353Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8233502Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8233980Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360. 2025-12-04T13:24:33.8234097Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8234291Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8234653Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8234770Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8234981Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8235145Z [rank3]:E1204 13:20:49.420000 466626 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8235187Z dist init r=3, world=4 2025-12-04T13:24:33.8235325Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8235482Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8235768Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8235931Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8236218Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8236342Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8236618Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8236774Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8237050Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8237208Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8237491Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8237626Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8237903Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8238052Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8238533Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224. 2025-12-04T13:24:33.8238648Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8238843Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8239203Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8239320Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8239532Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8239816Z [rank1]:E1204 13:20:49.451000 466624 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8239856Z dist init r=1, world=4 2025-12-04T13:24:33.8239993Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8240172Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8240457Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8240611Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8240892Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8241016Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8241305Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8241467Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8241765Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8241911Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8242188Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8242325Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8242603Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8242751Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8243229Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104. 2025-12-04T13:24:33.8243345Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8243540Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8243905Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8244017Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8244227Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8244390Z [rank0]:E1204 13:20:49.482000 466623 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8244440Z dist init r=0, world=4 2025-12-04T13:24:33.8244777Z [rank0]:[W1204 13:20:49.367551674 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8244817Z FAILED [9.8151s] [100%] 2025-12-04T13:24:33.8244821Z 2025-12-04T13:24:33.8244878Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8244977Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __ 2025-12-04T13:24:33.8245024Z Traceback (most recent call last): 2025-12-04T13:24:33.8245186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8245241Z self._join_processes(fn) 2025-12-04T13:24:33.8245415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8245481Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8245658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8245716Z raise RuntimeError(error) 2025-12-04T13:24:33.8245796Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8245841Z Traceback (most recent call last): 2025-12-04T13:24:33.8246001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8246044Z getattr(self, test_name)() 2025-12-04T13:24:33.8246201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8246237Z fn() 2025-12-04T13:24:33.8246390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8246432Z method(*args, **kwargs) 2025-12-04T13:24:33.8246581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8246623Z method(*args, **kwargs) 2025-12-04T13:24:33.8246773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8246810Z with policy(): 2025-12-04T13:24:33.8246961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8247002Z raise RuntimeError(msg) 2025-12-04T13:24:33.8247362Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224. 2025-12-04T13:24:33.8247366Z 2025-12-04T13:24:33.8247441Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8247675Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8247678Z 2025-12-04T13:24:33.8247765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8247767Z 2025-12-04T13:24:33.8247769Z 2025-12-04T13:24:33.8247844Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8247932Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8248163Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a8861b396c615a20.xml - 2025-12-04T13:24:33.8248235Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8248489Z FAILED [9.8151s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8248537Z Traceback (most recent call last): 2025-12-04T13:24:33.8248698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8248741Z getattr(self, test_name)() 2025-12-04T13:24:33.8248899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8248934Z fn() 2025-12-04T13:24:33.8249094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8249135Z method(*args, **kwargs) 2025-12-04T13:24:33.8249286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8249337Z method(*args, **kwargs) 2025-12-04T13:24:33.8249486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8249535Z with policy(): 2025-12-04T13:24:33.8249724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8249766Z raise RuntimeError(msg) 2025-12-04T13:24:33.8250121Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224. 2025-12-04T13:24:33.8250124Z 2025-12-04T13:24:33.8250199Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8250433Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8250436Z 2025-12-04T13:24:33.8250522Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8250584Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8250644Z ======================= 1 failed, 20 deselected in 9.97s ======================= 2025-12-04T13:24:33.8250682Z Got exit code 1 2025-12-04T13:24:33.8250863Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda 2025-12-04T13:24:33.8250993Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8251181Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0a4180a1435a7d8.xml 2025-12-04T13:24:33.8251238Z ============================= test session starts ============================== 2025-12-04T13:24:33.8251350Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8251393Z cachedir: .pytest_cache 2025-12-04T13:24:33.8251549Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8251597Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8251636Z configfile: pytest.ini 2025-12-04T13:24:33.8251799Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8251875Z collecting ... collected 60 items / 15 deselected / 45 selected 2025-12-04T13:24:33.8251929Z stepcurrent: skipping 15 already run items. 2025-12-04T13:24:33.8251991Z Running 6 items in this shard 2025-12-04T13:24:33.8251994Z 2025-12-04T13:24:33.8252301Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 13:20:54.023000 466956 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 467025 2025-12-04T13:24:33.8252456Z I1204 13:20:54.024000 466956 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 467026 2025-12-04T13:24:33.8252607Z I1204 13:20:54.025000 466956 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 467027 2025-12-04T13:24:33.8252757Z I1204 13:20:54.026000 466956 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 467028 2025-12-04T13:24:33.8253346Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8253417Z _warn_cpu_init() 2025-12-04T13:24:33.8253980Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8254017Z _warn_cpu_init() 2025-12-04T13:24:33.8254310Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8254354Z return func(*args, **kwargs) 2025-12-04T13:24:33.8254916Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8254955Z _warn_cpu_init() 2025-12-04T13:24:33.8255515Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8255553Z _warn_cpu_init() 2025-12-04T13:24:33.8255697Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8255860Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8256146Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8256302Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8256600Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8256727Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8257004Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8257151Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8257440Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8257586Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8257874Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8258022Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8258299Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8258448Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8258928Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.8259046Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8259240Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8259599Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8259767Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8259980Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8260145Z [rank0]:E1204 13:21:01.844000 467025 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8260184Z dist init r=0, world=4 2025-12-04T13:24:33.8260321Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8260480Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8260781Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8260934Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8261219Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8261342Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8261633Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8261782Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8262071Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8262230Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8262503Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8262640Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8262917Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8263065Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8263541Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.8263655Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8263851Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8264208Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8264327Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8264538Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8264701Z [rank1]:E1204 13:21:01.855000 467026 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8264741Z dist init r=1, world=4 2025-12-04T13:24:33.8264878Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8265048Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8265334Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8265489Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8265771Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8265907Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8266186Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8266352Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8266626Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8266771Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8267050Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8267185Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8267463Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8267611Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8268084Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.8268198Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8268393Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8268748Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8268862Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8269073Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8269248Z [rank3]:E1204 13:21:01.895000 467028 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8269289Z dist init r=3, world=4 2025-12-04T13:24:33.8269426Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8269586Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8269916Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8270068Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8270371Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8270513Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8270802Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8270948Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8271224Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8271372Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8271647Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8271783Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8272062Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8272210Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8272685Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.8272800Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8272994Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8273348Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8273475Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8273687Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8273851Z [rank2]:E1204 13:21:01.920000 467027 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8273890Z dist init r=2, world=4 2025-12-04T13:24:33.8274223Z [rank0]:[W1204 13:21:02.538588621 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8274263Z FAILED [9.7165s] [ 16%] 2025-12-04T13:24:33.8274265Z 2025-12-04T13:24:33.8274331Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8274433Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____ 2025-12-04T13:24:33.8274490Z Traceback (most recent call last): 2025-12-04T13:24:33.8274653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8274707Z self._join_processes(fn) 2025-12-04T13:24:33.8274880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8274934Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8275114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8275158Z raise RuntimeError(error) 2025-12-04T13:24:33.8275238Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8275284Z Traceback (most recent call last): 2025-12-04T13:24:33.8275446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8275489Z getattr(self, test_name)() 2025-12-04T13:24:33.8275646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8275683Z fn() 2025-12-04T13:24:33.8275834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8275875Z method(*args, **kwargs) 2025-12-04T13:24:33.8276026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8276068Z method(*args, **kwargs) 2025-12-04T13:24:33.8276218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8276257Z with policy(): 2025-12-04T13:24:33.8276409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8276452Z raise RuntimeError(msg) 2025-12-04T13:24:33.8276802Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.8276805Z 2025-12-04T13:24:33.8276881Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8277107Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8277110Z 2025-12-04T13:24:33.8277198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8277200Z 2025-12-04T13:24:33.8277271Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8277318Z Traceback (most recent call last): 2025-12-04T13:24:33.8277478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8277522Z getattr(self, test_name)() 2025-12-04T13:24:33.8277679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8277713Z fn() 2025-12-04T13:24:33.8277864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8277904Z method(*args, **kwargs) 2025-12-04T13:24:33.8278053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8278103Z method(*args, **kwargs) 2025-12-04T13:24:33.8278255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8278303Z with policy(): 2025-12-04T13:24:33.8278454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8278505Z raise RuntimeError(msg) 2025-12-04T13:24:33.8278852Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.8278855Z 2025-12-04T13:24:33.8278928Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8279155Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8279157Z 2025-12-04T13:24:33.8279244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8279248Z 2025-12-04T13:24:33.8279250Z 2025-12-04T13:24:33.8279325Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8279414Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8279645Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0a4180a1435a7d8.xml - 2025-12-04T13:24:33.8279745Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8279991Z FAILED [9.7165s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8280039Z Traceback (most recent call last): 2025-12-04T13:24:33.8280202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8280245Z getattr(self, test_name)() 2025-12-04T13:24:33.8280402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8280439Z fn() 2025-12-04T13:24:33.8280589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8280629Z method(*args, **kwargs) 2025-12-04T13:24:33.8280778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8280818Z method(*args, **kwargs) 2025-12-04T13:24:33.8280966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8281004Z with policy(): 2025-12-04T13:24:33.8281171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8281214Z raise RuntimeError(msg) 2025-12-04T13:24:33.8281562Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.8281565Z 2025-12-04T13:24:33.8281638Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8281865Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8281867Z 2025-12-04T13:24:33.8281966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8281968Z 2025-12-04T13:24:33.8282028Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8282088Z Traceback (most recent call last): 2025-12-04T13:24:33.8282249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8282309Z getattr(self, test_name)() 2025-12-04T13:24:33.8282468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8282502Z fn() 2025-12-04T13:24:33.8282654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8282694Z method(*args, **kwargs) 2025-12-04T13:24:33.8282844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8282885Z method(*args, **kwargs) 2025-12-04T13:24:33.8283033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8283072Z with policy(): 2025-12-04T13:24:33.8283221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8283264Z raise RuntimeError(msg) 2025-12-04T13:24:33.8283609Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.8283612Z 2025-12-04T13:24:33.8283684Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8283910Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8283912Z 2025-12-04T13:24:33.8283999Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8284062Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8284124Z ======================= 1 failed, 15 deselected in 9.86s ======================= 2025-12-04T13:24:33.8284161Z Got exit code 1 2025-12-04T13:24:33.8284203Z Retrying single test... 2025-12-04T13:24:33.8284391Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4bc6f848e69ba66f.xml 2025-12-04T13:24:33.8284449Z ============================= test session starts ============================== 2025-12-04T13:24:33.8284562Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8284603Z cachedir: .pytest_cache 2025-12-04T13:24:33.8284763Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8284819Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8284862Z configfile: pytest.ini 2025-12-04T13:24:33.8285024Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8285100Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8285322Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8285367Z Running 1 items in this shard 2025-12-04T13:24:33.8285369Z 2025-12-04T13:24:33.8285695Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 13:21:06.361000 467358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 467427 2025-12-04T13:24:33.8285851Z I1204 13:21:06.361000 467358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 467428 2025-12-04T13:24:33.8286012Z I1204 13:21:06.362000 467358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 467429 2025-12-04T13:24:33.8286174Z I1204 13:21:06.363000 467358 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 467430 2025-12-04T13:24:33.8286750Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8286788Z _warn_cpu_init() 2025-12-04T13:24:33.8287351Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8287390Z _warn_cpu_init() 2025-12-04T13:24:33.8287956Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8287994Z _warn_cpu_init() 2025-12-04T13:24:33.8288557Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8288596Z _warn_cpu_init() 2025-12-04T13:24:33.8288884Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8288929Z return func(*args, **kwargs) 2025-12-04T13:24:33.8289072Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8289249Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8289537Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8289728Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8290013Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8290154Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8290433Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8290592Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8290882Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8291027Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8291303Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8291440Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8291717Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8291865Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8292344Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.8292462Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8292658Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8293014Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8293128Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8293340Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8293518Z [rank2]:E1204 13:21:14.139000 467429 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8293558Z dist init r=2, world=4 2025-12-04T13:24:33.8293695Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8293855Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8294140Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8294293Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8294586Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8294724Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8295010Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8295157Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8295433Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8295580Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8295856Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8295991Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8296267Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8296414Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8296889Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.8297005Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8297201Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8297555Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8297679Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8297893Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8298059Z [rank3]:E1204 13:21:14.149000 467430 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8298098Z dist init r=3, world=4 2025-12-04T13:24:33.8298235Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8298393Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8298691Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8298844Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8299136Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8299269Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8299548Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8299739Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8300019Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8300166Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8300440Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8300576Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8300854Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8301001Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8301473Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.8301588Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8301782Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8302158Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8302275Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8302486Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8302649Z [rank0]:E1204 13:21:14.159000 467427 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8302687Z dist init r=0, world=4 2025-12-04T13:24:33.8302825Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8302999Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8303296Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8303463Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8303747Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8303871Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8304148Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8304295Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8304570Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8304717Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8304992Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8305129Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8305409Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8305560Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8306032Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.8306157Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8306352Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8306707Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8306820Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8307030Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8307203Z [rank1]:E1204 13:21:14.213000 467428 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8307243Z dist init r=1, world=4 2025-12-04T13:24:33.8307586Z [rank0]:[W1204 13:21:14.880378269 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8307637Z FAILED [9.7153s] [100%] 2025-12-04T13:24:33.8307639Z 2025-12-04T13:24:33.8307694Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8307795Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____ 2025-12-04T13:24:33.8307840Z Traceback (most recent call last): 2025-12-04T13:24:33.8308005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8308049Z self._join_processes(fn) 2025-12-04T13:24:33.8308224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8308278Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8308455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8308500Z raise RuntimeError(error) 2025-12-04T13:24:33.8308580Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8308626Z Traceback (most recent call last): 2025-12-04T13:24:33.8308784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8308827Z getattr(self, test_name)() 2025-12-04T13:24:33.8308985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8309020Z fn() 2025-12-04T13:24:33.8309172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8309214Z method(*args, **kwargs) 2025-12-04T13:24:33.8309365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8309407Z method(*args, **kwargs) 2025-12-04T13:24:33.8309556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8309593Z with policy(): 2025-12-04T13:24:33.8309789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8309831Z raise RuntimeError(msg) 2025-12-04T13:24:33.8310196Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.8310199Z 2025-12-04T13:24:33.8310275Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8310500Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8310504Z 2025-12-04T13:24:33.8310591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8310593Z 2025-12-04T13:24:33.8310595Z 2025-12-04T13:24:33.8310670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8310757Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8311003Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4bc6f848e69ba66f.xml - 2025-12-04T13:24:33.8311064Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8311325Z FAILED [9.7153s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8311383Z Traceback (most recent call last): 2025-12-04T13:24:33.8311547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8311589Z getattr(self, test_name)() 2025-12-04T13:24:33.8311747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8311781Z fn() 2025-12-04T13:24:33.8311932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8311972Z method(*args, **kwargs) 2025-12-04T13:24:33.8312122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8312163Z method(*args, **kwargs) 2025-12-04T13:24:33.8312313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8312353Z with policy(): 2025-12-04T13:24:33.8312504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8312545Z raise RuntimeError(msg) 2025-12-04T13:24:33.8312893Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.8312896Z 2025-12-04T13:24:33.8312971Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8313198Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8313202Z 2025-12-04T13:24:33.8313289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8313350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8313412Z ======================= 1 failed, 20 deselected in 9.87s ======================= 2025-12-04T13:24:33.8313449Z Got exit code 1 2025-12-04T13:24:33.8313489Z Retrying single test... 2025-12-04T13:24:33.8313677Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd8f68ac728a21f0.xml 2025-12-04T13:24:33.8313736Z ============================= test session starts ============================== 2025-12-04T13:24:33.8313856Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8313899Z cachedir: .pytest_cache 2025-12-04T13:24:33.8314058Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8314104Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8314145Z configfile: pytest.ini 2025-12-04T13:24:33.8314306Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8314383Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8314605Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8314649Z Running 1 items in this shard 2025-12-04T13:24:33.8314661Z 2025-12-04T13:24:33.8314966Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 13:21:18.658000 467760 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 467829 2025-12-04T13:24:33.8315132Z I1204 13:21:18.659000 467760 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 467830 2025-12-04T13:24:33.8315300Z I1204 13:21:18.659000 467760 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 467831 2025-12-04T13:24:33.8315451Z I1204 13:21:18.660000 467760 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 467832 2025-12-04T13:24:33.8316030Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8316069Z _warn_cpu_init() 2025-12-04T13:24:33.8316633Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8316672Z _warn_cpu_init() 2025-12-04T13:24:33.8317234Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8317274Z _warn_cpu_init() 2025-12-04T13:24:33.8317561Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8317605Z return func(*args, **kwargs) 2025-12-04T13:24:33.8318178Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8318217Z _warn_cpu_init() 2025-12-04T13:24:33.8318360Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8318523Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8318811Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8318966Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8319260Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8319394Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8319682Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8319867Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8320143Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8320292Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8320568Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8320706Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8320984Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8321131Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8321609Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.8321728Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8321923Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8322279Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8322396Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8322626Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8322791Z [rank0]:E1204 13:21:26.448000 467829 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8322832Z dist init r=0, world=4 2025-12-04T13:24:33.8322969Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8323127Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8323425Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8323580Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8323883Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8324025Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8324300Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8324448Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8324723Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8324871Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8325145Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8325281Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8325559Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8325706Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8326185Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.8326300Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8326495Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8326860Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8326974Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8327186Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8327348Z [rank2]:E1204 13:21:26.452000 467831 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8327388Z dist init r=2, world=4 2025-12-04T13:24:33.8327524Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8327694Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8327989Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8328153Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8328436Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8328558Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8328837Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8328984Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8329260Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8329405Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8329680Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8329848Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8330126Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8330274Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8330745Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 29184 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536. 2025-12-04T13:24:33.8330860Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8331069Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8331427Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8331541Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8331752Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8331928Z [rank3]:E1204 13:21:26.480000 467832 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8331968Z dist init r=3, world=4 2025-12-04T13:24:33.8332126Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8332284Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8332583Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8332736Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8333021Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8333144Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8333419Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8333566Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8333840Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8333988Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8334264Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8334401Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8334679Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8334827Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8335312Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400. 2025-12-04T13:24:33.8335427Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8335622Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8335974Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8336097Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8336310Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8336483Z [rank1]:E1204 13:21:26.488000 467830 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8336535Z dist init r=1, world=4 2025-12-04T13:24:33.8336870Z [rank0]:[W1204 13:21:26.216779064 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8336910Z FAILED [9.9155s] [100%] 2025-12-04T13:24:33.8336912Z 2025-12-04T13:24:33.8336966Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8337067Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____ 2025-12-04T13:24:33.8337113Z Traceback (most recent call last): 2025-12-04T13:24:33.8337276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8337320Z self._join_processes(fn) 2025-12-04T13:24:33.8337492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8337547Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8337725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8337769Z raise RuntimeError(error) 2025-12-04T13:24:33.8337849Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8337893Z Traceback (most recent call last): 2025-12-04T13:24:33.8338054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8338098Z getattr(self, test_name)() 2025-12-04T13:24:33.8338255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8338291Z fn() 2025-12-04T13:24:33.8338442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8338483Z method(*args, **kwargs) 2025-12-04T13:24:33.8338633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8338674Z method(*args, **kwargs) 2025-12-04T13:24:33.8338822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8338860Z with policy(): 2025-12-04T13:24:33.8339011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8339064Z raise RuntimeError(msg) 2025-12-04T13:24:33.8339415Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.8339418Z 2025-12-04T13:24:33.8339494Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8339767Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8339770Z 2025-12-04T13:24:33.8339857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8339860Z 2025-12-04T13:24:33.8339936Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8339981Z Traceback (most recent call last): 2025-12-04T13:24:33.8340144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8340202Z getattr(self, test_name)() 2025-12-04T13:24:33.8340359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8340410Z fn() 2025-12-04T13:24:33.8340560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8340600Z method(*args, **kwargs) 2025-12-04T13:24:33.8340749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8340788Z method(*args, **kwargs) 2025-12-04T13:24:33.8340938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8340975Z with policy(): 2025-12-04T13:24:33.8341127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8341169Z raise RuntimeError(msg) 2025-12-04T13:24:33.8341517Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.8341520Z 2025-12-04T13:24:33.8341593Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8341819Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8341821Z 2025-12-04T13:24:33.8341909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8341911Z 2025-12-04T13:24:33.8341913Z 2025-12-04T13:24:33.8341989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8342079Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8342311Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd8f68ac728a21f0.xml - 2025-12-04T13:24:33.8342372Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8342615Z FAILED [9.9155s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8342662Z Traceback (most recent call last): 2025-12-04T13:24:33.8342824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8342881Z getattr(self, test_name)() 2025-12-04T13:24:33.8343039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8343075Z fn() 2025-12-04T13:24:33.8343226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8343267Z method(*args, **kwargs) 2025-12-04T13:24:33.8343416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8343455Z method(*args, **kwargs) 2025-12-04T13:24:33.8343603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8343719Z with policy(): 2025-12-04T13:24:33.8343883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8343925Z raise RuntimeError(msg) 2025-12-04T13:24:33.8344284Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280. 2025-12-04T13:24:33.8344296Z 2025-12-04T13:24:33.8344369Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8344598Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8344600Z 2025-12-04T13:24:33.8344684Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8344687Z 2025-12-04T13:24:33.8344746Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8344792Z Traceback (most recent call last): 2025-12-04T13:24:33.8344955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8344999Z getattr(self, test_name)() 2025-12-04T13:24:33.8345155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8345192Z fn() 2025-12-04T13:24:33.8345341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8345382Z method(*args, **kwargs) 2025-12-04T13:24:33.8345531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8345571Z method(*args, **kwargs) 2025-12-04T13:24:33.8345719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8345757Z with policy(): 2025-12-04T13:24:33.8345908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8345950Z raise RuntimeError(msg) 2025-12-04T13:24:33.8346295Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184. 2025-12-04T13:24:33.8346298Z 2025-12-04T13:24:33.8346371Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8346597Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8346599Z 2025-12-04T13:24:33.8346687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8346761Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8346825Z ====================== 1 failed, 20 deselected in 10.08s ======================= 2025-12-04T13:24:33.8346864Z Got exit code 1 2025-12-04T13:24:33.8347043Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda 2025-12-04T13:24:33.8347171Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8347357Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9076dc9267495fd3.xml 2025-12-04T13:24:33.8347414Z ============================= test session starts ============================== 2025-12-04T13:24:33.8347527Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8347585Z cachedir: .pytest_cache 2025-12-04T13:24:33.8347743Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8347801Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8347842Z configfile: pytest.ini 2025-12-04T13:24:33.8348014Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8348088Z collecting ... collected 60 items / 16 deselected / 44 selected 2025-12-04T13:24:33.8348142Z stepcurrent: skipping 16 already run items. 2025-12-04T13:24:33.8348186Z Running 5 items in this shard 2025-12-04T13:24:33.8348188Z 2025-12-04T13:24:33.8348550Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 13:21:31.029000 468162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 468231 2025-12-04T13:24:33.8348705Z I1204 13:21:31.030000 468162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 468232 2025-12-04T13:24:33.8348858Z I1204 13:21:31.031000 468162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 468233 2025-12-04T13:24:33.8349010Z I1204 13:21:31.031000 468162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 468234 2025-12-04T13:24:33.8349581Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8349621Z _warn_cpu_init() 2025-12-04T13:24:33.8349942Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8349986Z return func(*args, **kwargs) 2025-12-04T13:24:33.8350550Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8350588Z _warn_cpu_init() 2025-12-04T13:24:33.8351166Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8351205Z _warn_cpu_init() 2025-12-04T13:24:33.8351766Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8351805Z _warn_cpu_init() 2025-12-04T13:24:33.8351961Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8352124Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8352426Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8352593Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8352878Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8353004Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8353281Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8353431Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8353708Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8353854Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8354132Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8354268Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8354546Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8354693Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8355237Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8355354Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8355549Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8355967Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8356081Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8356302Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8356480Z [rank2]:E1204 13:21:36.870000 468233 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8356520Z dist init r=2, world=4 2025-12-04T13:24:33.8356671Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8356829Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8357116Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8357270Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8357554Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8357679Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8357956Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8358102Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8358378Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8358524Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8358799Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8358935Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8359212Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8359361Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8359943Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8360059Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8360254Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8360680Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8360808Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8361031Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8361195Z [rank1]:E1204 13:21:36.872000 468232 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8361235Z dist init r=1, world=4 2025-12-04T13:24:33.8361371Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8361531Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8361818Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8361974Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8362257Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8362382Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8362658Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8362806Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8363084Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8363230Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8363506Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8363642Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8363936Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8364085Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8364620Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112. 2025-12-04T13:24:33.8364748Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8364943Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8365368Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8365491Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8365701Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8365866Z [rank3]:E1204 13:21:36.924000 468234 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8366005Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8366165Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8366528Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8366684Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8366970Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8367094Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8367372Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8367519Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8367795Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8367940Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8368229Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8368365Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8368643Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8368789Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8369330Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856. 2025-12-04T13:24:33.8369456Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8369660Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8370107Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8370220Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8370434Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8370598Z [rank0]:E1204 13:21:36.925000 468231 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8370639Z dist init r=3, world=4 2025-12-04T13:24:33.8370678Z dist init r=0, world=4 2025-12-04T13:24:33.8371014Z [rank0]:[W1204 13:21:37.729713373 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8371054Z FAILED [7.5138s] [ 20%] 2025-12-04T13:24:33.8371056Z 2025-12-04T13:24:33.8371110Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8371263Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.8371310Z Traceback (most recent call last): 2025-12-04T13:24:33.8371473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8371517Z self._join_processes(fn) 2025-12-04T13:24:33.8371690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8371742Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8371922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8371966Z raise RuntimeError(error) 2025-12-04T13:24:33.8372049Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8372095Z Traceback (most recent call last): 2025-12-04T13:24:33.8372270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8372314Z getattr(self, test_name)() 2025-12-04T13:24:33.8372471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8372507Z fn() 2025-12-04T13:24:33.8372658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8372699Z method(*args, **kwargs) 2025-12-04T13:24:33.8372849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8372890Z method(*args, **kwargs) 2025-12-04T13:24:33.8373039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8373090Z with policy(): 2025-12-04T13:24:33.8373242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8373297Z raise RuntimeError(msg) 2025-12-04T13:24:33.8373701Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8373717Z 2025-12-04T13:24:33.8373794Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8374080Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8374082Z 2025-12-04T13:24:33.8374171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8374174Z 2025-12-04T13:24:33.8374234Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8374280Z Traceback (most recent call last): 2025-12-04T13:24:33.8374442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8374484Z getattr(self, test_name)() 2025-12-04T13:24:33.8374642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8374677Z fn() 2025-12-04T13:24:33.8374828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8374868Z method(*args, **kwargs) 2025-12-04T13:24:33.8375019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8375059Z method(*args, **kwargs) 2025-12-04T13:24:33.8375209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8375247Z with policy(): 2025-12-04T13:24:33.8375398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8375440Z raise RuntimeError(msg) 2025-12-04T13:24:33.8375841Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8375844Z 2025-12-04T13:24:33.8375918Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8376215Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8376219Z 2025-12-04T13:24:33.8376309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8376312Z 2025-12-04T13:24:33.8376314Z 2025-12-04T13:24:33.8376389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8376478Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8376708Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9076dc9267495fd3.xml - 2025-12-04T13:24:33.8376769Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8377085Z FAILED [7.5138s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8377144Z Traceback (most recent call last): 2025-12-04T13:24:33.8377306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8377365Z getattr(self, test_name)() 2025-12-04T13:24:33.8377523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8377559Z fn() 2025-12-04T13:24:33.8377709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8377749Z method(*args, **kwargs) 2025-12-04T13:24:33.8377897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8377940Z method(*args, **kwargs) 2025-12-04T13:24:33.8378089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8378126Z with policy(): 2025-12-04T13:24:33.8378277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8378319Z raise RuntimeError(msg) 2025-12-04T13:24:33.8378721Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8378724Z 2025-12-04T13:24:33.8378796Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8379082Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8379085Z 2025-12-04T13:24:33.8379170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8379172Z 2025-12-04T13:24:33.8379231Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8379276Z Traceback (most recent call last): 2025-12-04T13:24:33.8379439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8379482Z getattr(self, test_name)() 2025-12-04T13:24:33.8379639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8379674Z fn() 2025-12-04T13:24:33.8379860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8379901Z method(*args, **kwargs) 2025-12-04T13:24:33.8380064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8380105Z method(*args, **kwargs) 2025-12-04T13:24:33.8380252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8380291Z with policy(): 2025-12-04T13:24:33.8380440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8380482Z raise RuntimeError(msg) 2025-12-04T13:24:33.8380894Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8380896Z 2025-12-04T13:24:33.8380971Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8381272Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8381290Z 2025-12-04T13:24:33.8381378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8381443Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8381505Z ======================= 1 failed, 16 deselected in 7.67s ======================= 2025-12-04T13:24:33.8381542Z Got exit code 1 2025-12-04T13:24:33.8381583Z Retrying single test... 2025-12-04T13:24:33.8381774Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31cf5fa477575a76.xml 2025-12-04T13:24:33.8381832Z ============================= test session starts ============================== 2025-12-04T13:24:33.8381944Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8381986Z cachedir: .pytest_cache 2025-12-04T13:24:33.8382144Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8382190Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8382231Z configfile: pytest.ini 2025-12-04T13:24:33.8382394Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8382468Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8382748Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8382793Z Running 1 items in this shard 2025-12-04T13:24:33.8382795Z 2025-12-04T13:24:33.8383154Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 13:21:41.002000 468564 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 468633 2025-12-04T13:24:33.8383309Z I1204 13:21:41.003000 468564 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 468634 2025-12-04T13:24:33.8383461Z I1204 13:21:41.003000 468564 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 468635 2025-12-04T13:24:33.8383609Z I1204 13:21:41.004000 468564 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 468636 2025-12-04T13:24:33.8384260Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8384300Z _warn_cpu_init() 2025-12-04T13:24:33.8384865Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8384903Z _warn_cpu_init() 2025-12-04T13:24:33.8385474Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8385535Z _warn_cpu_init() 2025-12-04T13:24:33.8385824Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8385868Z return func(*args, **kwargs) 2025-12-04T13:24:33.8386432Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8386472Z _warn_cpu_init() 2025-12-04T13:24:33.8386615Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8386777Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8387067Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8387222Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8387508Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8387634Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8387913Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8388061Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8388349Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8388500Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8388774Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8388911Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8389187Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8389346Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8389913Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8390056Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8390251Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8390664Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8390781Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8390992Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8391155Z [rank1]:E1204 13:21:46.938000 468634 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8391195Z dist init r=1, world=4 2025-12-04T13:24:33.8391333Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8391492Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8391778Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8391933Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8392218Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8392344Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8392634Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8392782Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8393057Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8393204Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8393479Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8393629Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8393909Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8394082Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8394611Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8394727Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8394922Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8395336Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8395449Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8395660Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8395824Z [rank2]:E1204 13:21:46.939000 468635 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8395863Z dist init r=2, world=4 2025-12-04T13:24:33.8396001Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8396159Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8396447Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8396600Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8396898Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8397023Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8397300Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8397447Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8397723Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8397879Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8398156Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8398312Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8398587Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8398735Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8399265Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856. 2025-12-04T13:24:33.8399381Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8399576Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8400027Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8400142Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8400351Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8400516Z [rank0]:E1204 13:21:46.987000 468633 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8400556Z dist init r=0, world=4 2025-12-04T13:24:33.8400692Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8400851Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8401137Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8401306Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8401592Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8401717Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8401993Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8402152Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8402430Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8402593Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8402902Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8403050Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8405304Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8405460Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8405996Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112. 2025-12-04T13:24:33.8406118Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8406316Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8406730Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8406845Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8407059Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8407223Z [rank3]:E1204 13:21:46.992000 468636 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8407262Z dist init r=3, world=4 2025-12-04T13:24:33.8407621Z [rank0]:[W1204 13:21:47.755666516 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8407665Z FAILED [7.6150s] [100%] 2025-12-04T13:24:33.8407667Z 2025-12-04T13:24:33.8407726Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8407879Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.8407927Z Traceback (most recent call last): 2025-12-04T13:24:33.8408091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8408136Z self._join_processes(fn) 2025-12-04T13:24:33.8408309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8408376Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8408557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8408626Z raise RuntimeError(error) 2025-12-04T13:24:33.8408707Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8408765Z Traceback (most recent call last): 2025-12-04T13:24:33.8408925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8408968Z getattr(self, test_name)() 2025-12-04T13:24:33.8409124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8409160Z fn() 2025-12-04T13:24:33.8409312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8409355Z method(*args, **kwargs) 2025-12-04T13:24:33.8409506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8409548Z method(*args, **kwargs) 2025-12-04T13:24:33.8409733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8409772Z with policy(): 2025-12-04T13:24:33.8409923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8409964Z raise RuntimeError(msg) 2025-12-04T13:24:33.8410371Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8410374Z 2025-12-04T13:24:33.8410451Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8410741Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8410745Z 2025-12-04T13:24:33.8410833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8410836Z 2025-12-04T13:24:33.8410895Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8410941Z Traceback (most recent call last): 2025-12-04T13:24:33.8411103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8411147Z getattr(self, test_name)() 2025-12-04T13:24:33.8411305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8411341Z fn() 2025-12-04T13:24:33.8411513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8411555Z method(*args, **kwargs) 2025-12-04T13:24:33.8411704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8411745Z method(*args, **kwargs) 2025-12-04T13:24:33.8411893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8411931Z with policy(): 2025-12-04T13:24:33.8412080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8412122Z raise RuntimeError(msg) 2025-12-04T13:24:33.8412536Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8412555Z 2025-12-04T13:24:33.8412632Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8412935Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8412938Z 2025-12-04T13:24:33.8413025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8413027Z 2025-12-04T13:24:33.8413029Z 2025-12-04T13:24:33.8413107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8413196Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8413435Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31cf5fa477575a76.xml - 2025-12-04T13:24:33.8413498Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8413800Z FAILED [7.6150s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8413846Z Traceback (most recent call last): 2025-12-04T13:24:33.8414009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8414051Z getattr(self, test_name)() 2025-12-04T13:24:33.8414211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8414248Z fn() 2025-12-04T13:24:33.8414398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8414439Z method(*args, **kwargs) 2025-12-04T13:24:33.8414588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8414629Z method(*args, **kwargs) 2025-12-04T13:24:33.8414777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8414815Z with policy(): 2025-12-04T13:24:33.8414964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8415005Z raise RuntimeError(msg) 2025-12-04T13:24:33.8415422Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8415428Z 2025-12-04T13:24:33.8415503Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8415789Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8415791Z 2025-12-04T13:24:33.8415878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8415880Z 2025-12-04T13:24:33.8415938Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8415983Z Traceback (most recent call last): 2025-12-04T13:24:33.8416161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8416203Z getattr(self, test_name)() 2025-12-04T13:24:33.8416362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8416410Z fn() 2025-12-04T13:24:33.8416561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8416610Z method(*args, **kwargs) 2025-12-04T13:24:33.8416759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8416798Z method(*args, **kwargs) 2025-12-04T13:24:33.8416947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8416983Z with policy(): 2025-12-04T13:24:33.8417135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8417176Z raise RuntimeError(msg) 2025-12-04T13:24:33.8417576Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8417580Z 2025-12-04T13:24:33.8417653Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8417938Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8417940Z 2025-12-04T13:24:33.8418026Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8418090Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8418155Z ======================= 1 failed, 20 deselected in 7.76s ======================= 2025-12-04T13:24:33.8418192Z Got exit code 1 2025-12-04T13:24:33.8418234Z Retrying single test... 2025-12-04T13:24:33.8418424Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-64c63adbd056a965.xml 2025-12-04T13:24:33.8418484Z ============================= test session starts ============================== 2025-12-04T13:24:33.8418599Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8418641Z cachedir: .pytest_cache 2025-12-04T13:24:33.8418799Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8418845Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8418886Z configfile: pytest.ini 2025-12-04T13:24:33.8419063Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8419138Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8419419Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8419464Z Running 1 items in this shard 2025-12-04T13:24:33.8419466Z 2025-12-04T13:24:33.8419871Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 13:21:51.302000 468966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 469035 2025-12-04T13:24:33.8420046Z I1204 13:21:51.303000 468966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 469036 2025-12-04T13:24:33.8420199Z I1204 13:21:51.304000 468966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 469037 2025-12-04T13:24:33.8420364Z I1204 13:21:51.304000 468966 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 469038 2025-12-04T13:24:33.8420960Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8420998Z _warn_cpu_init() 2025-12-04T13:24:33.8421566Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8421604Z _warn_cpu_init() 2025-12-04T13:24:33.8422167Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8422203Z _warn_cpu_init() 2025-12-04T13:24:33.8422768Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8422807Z _warn_cpu_init() 2025-12-04T13:24:33.8423186Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8423230Z return func(*args, **kwargs) 2025-12-04T13:24:33.8423372Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8423535Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8423839Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8423996Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8424279Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8424404Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8424704Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8424864Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8425146Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8425303Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8425578Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8425715Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8425994Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8426144Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8426673Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8426791Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8426987Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8427405Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8427518Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8427729Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8427907Z [rank1]:E1204 13:21:57.159000 469036 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8427947Z dist init r=1, world=4 2025-12-04T13:24:33.8428086Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8428245Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8428532Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8428684Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8428980Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8429115Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8429393Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8429551Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8429863Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8430011Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8430286Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8430423Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8430698Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8430846Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8431378Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8431493Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8431689Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8432104Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8432232Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8432444Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8432609Z [rank2]:E1204 13:21:57.162000 469037 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8432649Z dist init r=2, world=4 2025-12-04T13:24:33.8432785Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8432944Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8433246Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8433399Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8433696Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8433833Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8434110Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8434258Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8434537Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8434684Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8434960Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8435094Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8435373Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8435521Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8436047Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856. 2025-12-04T13:24:33.8436162Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8436357Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8436783Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8436898Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8437108Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8437272Z [rank0]:E1204 13:21:57.178000 469035 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8437311Z dist init r=0, world=4 2025-12-04T13:24:33.8437459Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8437618Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8437914Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8438077Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8438361Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8438484Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8438761Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8438910Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8439185Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8439331Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8439606Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8439782Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8440059Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8440207Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8440760Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112. 2025-12-04T13:24:33.8440873Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8441069Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8441484Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8441598Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8441822Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8441985Z [rank3]:E1204 13:21:57.178000 469038 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8442039Z dist init r=3, world=4 2025-12-04T13:24:33.8442388Z [rank0]:[W1204 13:21:57.849149715 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8442428Z FAILED [7.5133s] [100%] 2025-12-04T13:24:33.8442430Z 2025-12-04T13:24:33.8442487Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8442639Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _ 2025-12-04T13:24:33.8442685Z Traceback (most recent call last): 2025-12-04T13:24:33.8442849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8442893Z self._join_processes(fn) 2025-12-04T13:24:33.8443066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8443121Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8443300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8443344Z raise RuntimeError(error) 2025-12-04T13:24:33.8443425Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8443470Z Traceback (most recent call last): 2025-12-04T13:24:33.8443633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8443677Z getattr(self, test_name)() 2025-12-04T13:24:33.8443836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8443873Z fn() 2025-12-04T13:24:33.8444024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8444066Z method(*args, **kwargs) 2025-12-04T13:24:33.8444215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8444255Z method(*args, **kwargs) 2025-12-04T13:24:33.8444404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8444442Z with policy(): 2025-12-04T13:24:33.8444593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8444635Z raise RuntimeError(msg) 2025-12-04T13:24:33.8445049Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856. 2025-12-04T13:24:33.8445053Z 2025-12-04T13:24:33.8445129Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8445415Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8445419Z 2025-12-04T13:24:33.8445507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8445510Z 2025-12-04T13:24:33.8445580Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8445626Z Traceback (most recent call last): 2025-12-04T13:24:33.8445789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8445843Z getattr(self, test_name)() 2025-12-04T13:24:33.8446003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8446049Z fn() 2025-12-04T13:24:33.8446199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8446239Z method(*args, **kwargs) 2025-12-04T13:24:33.8446388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8446427Z method(*args, **kwargs) 2025-12-04T13:24:33.8446577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8446614Z with policy(): 2025-12-04T13:24:33.8446767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8446808Z raise RuntimeError(msg) 2025-12-04T13:24:33.8447210Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8447212Z 2025-12-04T13:24:33.8447287Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8447571Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8447573Z 2025-12-04T13:24:33.8447663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8447666Z 2025-12-04T13:24:33.8447723Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8447769Z Traceback (most recent call last): 2025-12-04T13:24:33.8447930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8447973Z getattr(self, test_name)() 2025-12-04T13:24:33.8448130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8448166Z fn() 2025-12-04T13:24:33.8448315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8448355Z method(*args, **kwargs) 2025-12-04T13:24:33.8448504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8448557Z method(*args, **kwargs) 2025-12-04T13:24:33.8448707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8448744Z with policy(): 2025-12-04T13:24:33.8448895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8448937Z raise RuntimeError(msg) 2025-12-04T13:24:33.8449338Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8449341Z 2025-12-04T13:24:33.8449423Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8449750Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8449771Z 2025-12-04T13:24:33.8449858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8449874Z 2025-12-04T13:24:33.8449875Z 2025-12-04T13:24:33.8449952Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8450039Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8450273Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-64c63adbd056a965.xml - 2025-12-04T13:24:33.8450333Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8450640Z FAILED [7.5133s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8450688Z Traceback (most recent call last): 2025-12-04T13:24:33.8450850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8450895Z getattr(self, test_name)() 2025-12-04T13:24:33.8451052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8451088Z fn() 2025-12-04T13:24:33.8451237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8451278Z method(*args, **kwargs) 2025-12-04T13:24:33.8451429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8451469Z method(*args, **kwargs) 2025-12-04T13:24:33.8451618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8451656Z with policy(): 2025-12-04T13:24:33.8451807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8451849Z raise RuntimeError(msg) 2025-12-04T13:24:33.8452249Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856. 2025-12-04T13:24:33.8452252Z 2025-12-04T13:24:33.8452325Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8452623Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8452626Z 2025-12-04T13:24:33.8452712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8452715Z 2025-12-04T13:24:33.8452773Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8452818Z Traceback (most recent call last): 2025-12-04T13:24:33.8452982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8453025Z getattr(self, test_name)() 2025-12-04T13:24:33.8453183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8453218Z fn() 2025-12-04T13:24:33.8453381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8453422Z method(*args, **kwargs) 2025-12-04T13:24:33.8453583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8453623Z method(*args, **kwargs) 2025-12-04T13:24:33.8453789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8453825Z with policy(): 2025-12-04T13:24:33.8453977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8454017Z raise RuntimeError(msg) 2025-12-04T13:24:33.8454419Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976. 2025-12-04T13:24:33.8454421Z 2025-12-04T13:24:33.8454495Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8454777Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8454780Z 2025-12-04T13:24:33.8454867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8454870Z 2025-12-04T13:24:33.8454926Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8454972Z Traceback (most recent call last): 2025-12-04T13:24:33.8455133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8455175Z getattr(self, test_name)() 2025-12-04T13:24:33.8455336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8455372Z fn() 2025-12-04T13:24:33.8455519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8455561Z method(*args, **kwargs) 2025-12-04T13:24:33.8455709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8455750Z method(*args, **kwargs) 2025-12-04T13:24:33.8455898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8455934Z with policy(): 2025-12-04T13:24:33.8456084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8456124Z raise RuntimeError(msg) 2025-12-04T13:24:33.8456539Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760. 2025-12-04T13:24:33.8456543Z 2025-12-04T13:24:33.8456615Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8456897Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8456900Z 2025-12-04T13:24:33.8456983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8457048Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8457120Z ======================= 1 failed, 20 deselected in 7.66s ======================= 2025-12-04T13:24:33.8457159Z Got exit code 1 2025-12-04T13:24:33.8457393Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8457538Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8457741Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-03d682ae9b0da8c4.xml 2025-12-04T13:24:33.8457798Z ============================= test session starts ============================== 2025-12-04T13:24:33.8457911Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8457953Z cachedir: .pytest_cache 2025-12-04T13:24:33.8458112Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8458159Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8458202Z configfile: pytest.ini 2025-12-04T13:24:33.8458365Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8458439Z collecting ... collected 60 items / 17 deselected / 43 selected 2025-12-04T13:24:33.8458493Z stepcurrent: skipping 17 already run items. 2025-12-04T13:24:33.8458538Z Running 4 items in this shard 2025-12-04T13:24:33.8458540Z 2025-12-04T13:24:33.8458835Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 13:22:01.081000 469368 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 469437 2025-12-04T13:24:33.8458989Z I1204 13:22:01.082000 469368 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 469438 2025-12-04T13:24:33.8459141Z I1204 13:22:01.083000 469368 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 469439 2025-12-04T13:24:33.8459292Z I1204 13:22:01.083000 469368 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 469440 2025-12-04T13:24:33.8459657Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8459759Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8460114Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8460162Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8460529Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8460577Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8460926Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8460971Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8461556Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8461608Z _warn_cpu_init() 2025-12-04T13:24:33.8462168Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8462220Z _warn_cpu_init() 2025-12-04T13:24:33.8462783Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8462821Z _warn_cpu_init() 2025-12-04T13:24:33.8463112Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8463154Z return func(*args, **kwargs) 2025-12-04T13:24:33.8463718Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8463755Z _warn_cpu_init() 2025-12-04T13:24:33.8463899Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8464061Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8464348Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8464502Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8464799Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8464926Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8465201Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8465351Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8465626Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8465783Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8466059Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8466215Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8466493Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8466639Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8467113Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8467231Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8467426Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8467768Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8467881Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8468094Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8468258Z [rank2]:E1204 13:22:10.758000 469439 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8468299Z dist init r=2, world=4 2025-12-04T13:24:33.8468437Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8468597Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8468884Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8469048Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8469334Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8469458Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8469782Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8469928Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8470216Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8470384Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8470672Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8470807Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8471084Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8471234Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8471700Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8471816Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8472011Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8472355Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8472470Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8472681Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8472844Z [rank3]:E1204 13:22:10.760000 469440 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8472883Z dist init r=3, world=4 2025-12-04T13:24:33.8473020Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8473179Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8473478Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8473633Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8473918Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8474042Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8474329Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8474477Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8474766Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8474923Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8475197Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8475334Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8475613Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8475761Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8476224Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8476338Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8476536Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8476877Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8476990Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8477201Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8477364Z [rank1]:E1204 13:22:10.779000 469438 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8477404Z dist init r=1, world=4 2025-12-04T13:24:33.8477552Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8477714Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8478000Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8478153Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8478449Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8478574Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8478863Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8479021Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8479296Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8479442Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8479763Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8479899Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8480178Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8480326Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8480790Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8480906Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8481106Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8481451Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8481564Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8481789Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8481952Z [rank0]:E1204 13:22:10.794000 469437 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8481992Z dist init r=0, world=4 2025-12-04T13:24:33.8482327Z [rank0]:[W1204 13:22:11.617010265 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8482369Z FAILED [11.8157s] [ 25%] 2025-12-04T13:24:33.8482371Z 2025-12-04T13:24:33.8482430Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8482526Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________ 2025-12-04T13:24:33.8482573Z Traceback (most recent call last): 2025-12-04T13:24:33.8482749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8482807Z self._join_processes(fn) 2025-12-04T13:24:33.8482978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8483046Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8483223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8483267Z raise RuntimeError(error) 2025-12-04T13:24:33.8483348Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8483392Z Traceback (most recent call last): 2025-12-04T13:24:33.8483552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8483595Z getattr(self, test_name)() 2025-12-04T13:24:33.8483753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8483788Z fn() 2025-12-04T13:24:33.8483939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8483981Z method(*args, **kwargs) 2025-12-04T13:24:33.8484131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8484170Z method(*args, **kwargs) 2025-12-04T13:24:33.8484319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8484356Z with policy(): 2025-12-04T13:24:33.8484506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8484548Z raise RuntimeError(msg) 2025-12-04T13:24:33.8484887Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8484891Z 2025-12-04T13:24:33.8484966Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8485180Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8485182Z 2025-12-04T13:24:33.8485270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8485272Z 2025-12-04T13:24:33.8485330Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8485377Z Traceback (most recent call last): 2025-12-04T13:24:33.8485551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8485594Z getattr(self, test_name)() 2025-12-04T13:24:33.8485752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8485789Z fn() 2025-12-04T13:24:33.8485937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8485979Z method(*args, **kwargs) 2025-12-04T13:24:33.8486126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8486166Z method(*args, **kwargs) 2025-12-04T13:24:33.8486315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8486353Z with policy(): 2025-12-04T13:24:33.8486522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8486576Z raise RuntimeError(msg) 2025-12-04T13:24:33.8486911Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8486923Z 2025-12-04T13:24:33.8486998Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8487213Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8487215Z 2025-12-04T13:24:33.8487302Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8487304Z 2025-12-04T13:24:33.8487363Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8487408Z Traceback (most recent call last): 2025-12-04T13:24:33.8487570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8487612Z getattr(self, test_name)() 2025-12-04T13:24:33.8487770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8487806Z fn() 2025-12-04T13:24:33.8487957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8487996Z method(*args, **kwargs) 2025-12-04T13:24:33.8488144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8488183Z method(*args, **kwargs) 2025-12-04T13:24:33.8488332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8488370Z with policy(): 2025-12-04T13:24:33.8488520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8488562Z raise RuntimeError(msg) 2025-12-04T13:24:33.8488899Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8488901Z 2025-12-04T13:24:33.8488974Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8489184Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8489187Z 2025-12-04T13:24:33.8489273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8489275Z 2025-12-04T13:24:33.8489290Z 2025-12-04T13:24:33.8489366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8489454Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8489727Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-03d682ae9b0da8c4.xml - 2025-12-04T13:24:33.8489789Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8490025Z FAILED [11.8157s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8490072Z Traceback (most recent call last): 2025-12-04T13:24:33.8490252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8490297Z getattr(self, test_name)() 2025-12-04T13:24:33.8490456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8490504Z fn() 2025-12-04T13:24:33.8490669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8490709Z method(*args, **kwargs) 2025-12-04T13:24:33.8490857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8490897Z method(*args, **kwargs) 2025-12-04T13:24:33.8491045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8491081Z with policy(): 2025-12-04T13:24:33.8491232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8491274Z raise RuntimeError(msg) 2025-12-04T13:24:33.8491611Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8491616Z 2025-12-04T13:24:33.8491689Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8491901Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8491903Z 2025-12-04T13:24:33.8491989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8491993Z 2025-12-04T13:24:33.8492050Z Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8492096Z Traceback (most recent call last): 2025-12-04T13:24:33.8492257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8492300Z getattr(self, test_name)() 2025-12-04T13:24:33.8492459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8492493Z fn() 2025-12-04T13:24:33.8492642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8492683Z method(*args, **kwargs) 2025-12-04T13:24:33.8492831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8492871Z method(*args, **kwargs) 2025-12-04T13:24:33.8493019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8493056Z with policy(): 2025-12-04T13:24:33.8493219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8493261Z raise RuntimeError(msg) 2025-12-04T13:24:33.8493597Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8493600Z 2025-12-04T13:24:33.8493673Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8493884Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8493887Z 2025-12-04T13:24:33.8493983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8493985Z 2025-12-04T13:24:33.8494044Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8494088Z Traceback (most recent call last): 2025-12-04T13:24:33.8494261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8494313Z getattr(self, test_name)() 2025-12-04T13:24:33.8494471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8494504Z fn() 2025-12-04T13:24:33.8494653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8494693Z method(*args, **kwargs) 2025-12-04T13:24:33.8494843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8494884Z method(*args, **kwargs) 2025-12-04T13:24:33.8495033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8495070Z with policy(): 2025-12-04T13:24:33.8495220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8495262Z raise RuntimeError(msg) 2025-12-04T13:24:33.8495597Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8495600Z 2025-12-04T13:24:33.8495672Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8495884Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8495886Z 2025-12-04T13:24:33.8495972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8496036Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8496100Z ====================== 1 failed, 17 deselected in 11.98s ======================= 2025-12-04T13:24:33.8496138Z Got exit code 1 2025-12-04T13:24:33.8496178Z Retrying single test... 2025-12-04T13:24:33.8496366Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2b9dac894276d51e.xml 2025-12-04T13:24:33.8496427Z ============================= test session starts ============================== 2025-12-04T13:24:33.8496539Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8496580Z cachedir: .pytest_cache 2025-12-04T13:24:33.8496738Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8496796Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8496838Z configfile: pytest.ini 2025-12-04T13:24:33.8497001Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8497075Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8497288Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8497331Z Running 1 items in this shard 2025-12-04T13:24:33.8497333Z 2025-12-04T13:24:33.8497625Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 13:22:15.201000 469770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 469839 2025-12-04T13:24:33.8497790Z I1204 13:22:15.202000 469770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 469840 2025-12-04T13:24:33.8497941Z I1204 13:22:15.203000 469770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 469841 2025-12-04T13:24:33.8498103Z I1204 13:22:15.204000 469770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 469842 2025-12-04T13:24:33.8498470Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8498521Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8498873Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8498923Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8499272Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8499320Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8499670Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8499755Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8500330Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8500372Z _warn_cpu_init() 2025-12-04T13:24:33.8500934Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8500972Z _warn_cpu_init() 2025-12-04T13:24:33.8501555Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8501594Z _warn_cpu_init() 2025-12-04T13:24:33.8502157Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8502208Z _warn_cpu_init() 2025-12-04T13:24:33.8502500Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8502556Z return func(*args, **kwargs) 2025-12-04T13:24:33.8502712Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8502874Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8503164Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8503319Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8503606Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8503732Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8504009Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8504159Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8504438Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8504585Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8504860Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8504999Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8505275Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8505426Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8505907Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8506024Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8506220Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8506573Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8506689Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8506911Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8507086Z [rank2]:E1204 13:22:24.971000 469841 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8507126Z dist init r=2, world=4 2025-12-04T13:24:33.8507263Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8507423Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8507710Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8507864Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8508147Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8508271Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8508546Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8508695Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8508974Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8509120Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8509394Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8509528Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8509859Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8510007Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8510475Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 114176 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8510590Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8510797Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8511142Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8511281Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8511492Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8511654Z [rank3]:E1204 13:22:24.990000 469842 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8511694Z dist init r=3, world=4 2025-12-04T13:24:33.8511832Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8511992Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8512281Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8512434Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8512718Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8512841Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8513119Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8513267Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8513546Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8513692Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8513968Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8514114Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8514392Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8514539Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8515012Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8515128Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8515333Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8515691Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8515805Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8516017Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8516182Z [rank1]:E1204 13:22:25.030000 469840 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8516222Z dist init r=1, world=4 2025-12-04T13:24:33.8516359Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8516518Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8516804Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8516958Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8517241Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8517366Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8517641Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8517787Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8518066Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8518226Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8518505Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8518640Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8518918Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8519064Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8519538Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 114176 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8519673Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8519906Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8520246Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8520359Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8520572Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8520736Z [rank0]:E1204 13:22:25.066000 469839 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8520777Z dist init r=0, world=4 2025-12-04T13:24:33.8521111Z [rank0]:[W1204 13:22:25.956875264 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8521152Z FAILED [11.8174s] [100%] 2025-12-04T13:24:33.8521155Z 2025-12-04T13:24:33.8521209Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8521306Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________ 2025-12-04T13:24:33.8521354Z Traceback (most recent call last): 2025-12-04T13:24:33.8521517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8521561Z self._join_processes(fn) 2025-12-04T13:24:33.8521734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8521789Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8521966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8522009Z raise RuntimeError(error) 2025-12-04T13:24:33.8522088Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8522134Z Traceback (most recent call last): 2025-12-04T13:24:33.8522309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8522355Z getattr(self, test_name)() 2025-12-04T13:24:33.8522512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8522548Z fn() 2025-12-04T13:24:33.8522700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8522741Z method(*args, **kwargs) 2025-12-04T13:24:33.8522890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8522930Z method(*args, **kwargs) 2025-12-04T13:24:33.8523078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8523116Z with policy(): 2025-12-04T13:24:33.8523281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8523338Z raise RuntimeError(msg) 2025-12-04T13:24:33.8523673Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8523691Z 2025-12-04T13:24:33.8523766Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8523981Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8523983Z 2025-12-04T13:24:33.8524072Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8524074Z 2025-12-04T13:24:33.8524076Z 2025-12-04T13:24:33.8524152Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8524240Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8524473Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2b9dac894276d51e.xml - 2025-12-04T13:24:33.8524534Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8524771Z FAILED [11.8174s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8524816Z Traceback (most recent call last): 2025-12-04T13:24:33.8524981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8525024Z getattr(self, test_name)() 2025-12-04T13:24:33.8525184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8525220Z fn() 2025-12-04T13:24:33.8525369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8525411Z method(*args, **kwargs) 2025-12-04T13:24:33.8525560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8525601Z method(*args, **kwargs) 2025-12-04T13:24:33.8525749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8525787Z with policy(): 2025-12-04T13:24:33.8525936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8525978Z raise RuntimeError(msg) 2025-12-04T13:24:33.8526325Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8526328Z 2025-12-04T13:24:33.8526405Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8526619Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8526622Z 2025-12-04T13:24:33.8526709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8526772Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8526834Z ====================== 1 failed, 20 deselected in 11.98s ======================= 2025-12-04T13:24:33.8526881Z Got exit code 1 2025-12-04T13:24:33.8526921Z Retrying single test... 2025-12-04T13:24:33.8527110Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e84d3f08b788a32e.xml 2025-12-04T13:24:33.8527178Z ============================= test session starts ============================== 2025-12-04T13:24:33.8527301Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8527341Z cachedir: .pytest_cache 2025-12-04T13:24:33.8527500Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8527545Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8527586Z configfile: pytest.ini 2025-12-04T13:24:33.8527746Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8527822Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8528033Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8528079Z Running 1 items in this shard 2025-12-04T13:24:33.8528081Z 2025-12-04T13:24:33.8528373Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 13:22:29.645000 470172 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 470241 2025-12-04T13:24:33.8528527Z I1204 13:22:29.645000 470172 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 470242 2025-12-04T13:24:33.8528678Z I1204 13:22:29.646000 470172 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 470243 2025-12-04T13:24:33.8528828Z I1204 13:22:29.646000 470172 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 470244 2025-12-04T13:24:33.8529187Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8529237Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8529588Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8529635Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8530022Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8530083Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8530432Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8530479Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8531052Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8531105Z _warn_cpu_init() 2025-12-04T13:24:33.8531396Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8531459Z return func(*args, **kwargs) 2025-12-04T13:24:33.8532037Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8532075Z _warn_cpu_init() 2025-12-04T13:24:33.8532640Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8532678Z _warn_cpu_init() 2025-12-04T13:24:33.8533239Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8533275Z _warn_cpu_init() 2025-12-04T13:24:33.8533420Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8533584Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8533872Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8534028Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8534312Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8534438Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8534728Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8534878Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8535154Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8535301Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8535587Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8535723Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8536012Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8536169Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8536638Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8536755Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8536952Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8537295Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8537409Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8537621Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8537785Z [rank1]:E1204 13:22:39.220000 470242 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8537826Z dist init r=1, world=4 2025-12-04T13:24:33.8537963Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8538124Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8538409Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8538561Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8538854Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8538978Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8539258Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8539405Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8539681Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8539884Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8540174Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8540322Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8540599Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8540746Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8541212Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8541328Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8541525Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8541865Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8541980Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8542191Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8542355Z [rank2]:E1204 13:22:39.222000 470243 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8542395Z dist init r=2, world=4 2025-12-04T13:24:33.8542532Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8542690Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8542977Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8543145Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8543430Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8543555Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8543832Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8543989Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8544266Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8544424Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8544713Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8544849Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8545127Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8545274Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8545738Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8545853Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8546049Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8546392Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8546504Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8546715Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8546878Z [rank3]:E1204 13:22:39.271000 470244 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8546918Z dist init r=3, world=4 2025-12-04T13:24:33.8547054Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8547225Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8547509Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8547664Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8547948Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8548072Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8548373Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8548530Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8548816Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8548961Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8549236Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8549373Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8549650Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8549845Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8550307Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8550423Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8550620Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8550965Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8551078Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8551287Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8551452Z [rank0]:E1204 13:22:39.287000 470241 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8551490Z dist init r=0, world=4 2025-12-04T13:24:33.8551838Z [rank0]:[W1204 13:22:39.085318032 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8551880Z FAILED [11.5177s] [100%] 2025-12-04T13:24:33.8551882Z 2025-12-04T13:24:33.8551937Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8552033Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________ 2025-12-04T13:24:33.8552080Z Traceback (most recent call last): 2025-12-04T13:24:33.8552241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8552287Z self._join_processes(fn) 2025-12-04T13:24:33.8552472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8552526Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8552719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8552775Z raise RuntimeError(error) 2025-12-04T13:24:33.8552856Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8552901Z Traceback (most recent call last): 2025-12-04T13:24:33.8553061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8553103Z getattr(self, test_name)() 2025-12-04T13:24:33.8553261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8553296Z fn() 2025-12-04T13:24:33.8553448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8553490Z method(*args, **kwargs) 2025-12-04T13:24:33.8553640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8553681Z method(*args, **kwargs) 2025-12-04T13:24:33.8553830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8553868Z with policy(): 2025-12-04T13:24:33.8554019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8554060Z raise RuntimeError(msg) 2025-12-04T13:24:33.8554401Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8554404Z 2025-12-04T13:24:33.8554479Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8554694Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8554697Z 2025-12-04T13:24:33.8554785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8554787Z 2025-12-04T13:24:33.8554789Z 2025-12-04T13:24:33.8554862Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8554950Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8555184Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e84d3f08b788a32e.xml - 2025-12-04T13:24:33.8555270Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8555505Z FAILED [11.5177s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8555555Z Traceback (most recent call last): 2025-12-04T13:24:33.8555717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8555760Z getattr(self, test_name)() 2025-12-04T13:24:33.8555916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8555952Z fn() 2025-12-04T13:24:33.8556101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8556142Z method(*args, **kwargs) 2025-12-04T13:24:33.8556301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8556354Z method(*args, **kwargs) 2025-12-04T13:24:33.8556503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8556550Z with policy(): 2025-12-04T13:24:33.8556700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8556741Z raise RuntimeError(msg) 2025-12-04T13:24:33.8557078Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8557081Z 2025-12-04T13:24:33.8557156Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8557373Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8557376Z 2025-12-04T13:24:33.8557462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8557527Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8557589Z ====================== 1 failed, 20 deselected in 11.67s ======================= 2025-12-04T13:24:33.8557628Z Got exit code 1 2025-12-04T13:24:33.8557790Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda 2025-12-04T13:24:33.8557918Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8558107Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-dd025d88dd91e0ff.xml 2025-12-04T13:24:33.8558164Z ============================= test session starts ============================== 2025-12-04T13:24:33.8558276Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8558317Z cachedir: .pytest_cache 2025-12-04T13:24:33.8558475Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8558520Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8558561Z configfile: pytest.ini 2025-12-04T13:24:33.8558721Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8558796Z collecting ... collected 60 items / 18 deselected / 42 selected 2025-12-04T13:24:33.8558849Z stepcurrent: skipping 18 already run items. 2025-12-04T13:24:33.8558893Z Running 3 items in this shard 2025-12-04T13:24:33.8558896Z 2025-12-04T13:24:33.8559215Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 13:22:43.661000 470574 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 470643 2025-12-04T13:24:33.8559369Z I1204 13:22:43.661000 470574 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 470644 2025-12-04T13:24:33.8559521Z I1204 13:22:43.662000 470574 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 470645 2025-12-04T13:24:33.8559672Z I1204 13:22:43.663000 470574 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 470646 2025-12-04T13:24:33.8560088Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8560137Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8560490Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8560566Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8560913Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8560958Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8561307Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8561353Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8561925Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8561965Z _warn_cpu_init() 2025-12-04T13:24:33.8562255Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8562300Z return func(*args, **kwargs) 2025-12-04T13:24:33.8562865Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8562905Z _warn_cpu_init() 2025-12-04T13:24:33.8563465Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8563522Z _warn_cpu_init() 2025-12-04T13:24:33.8564083Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8564121Z _warn_cpu_init() 2025-12-04T13:24:33.8564265Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8564426Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8564727Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8564891Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8565185Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8565310Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8565587Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8565736Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8566012Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8566160Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8566436Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8566571Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8566852Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8567000Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8567482Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8567597Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8567804Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8568162Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8568278Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8568488Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8568651Z [rank3]:E1204 13:22:53.270000 470646 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8568691Z dist init r=3, world=4 2025-12-04T13:24:33.8568845Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8569005Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8569302Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8569467Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8569793Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8569917Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8570194Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8570341Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8570617Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8570762Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8571038Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8571175Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8571452Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8571601Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8572077Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8572206Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8572401Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8572759Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8572872Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8573095Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8573260Z [rank1]:E1204 13:22:53.275000 470644 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8573313Z dist init r=1, world=4 2025-12-04T13:24:33.8573450Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8573621Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8573913Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8574065Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8574350Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8574474Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8574750Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8574897Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8575173Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8575319Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8575594Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8575730Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8576006Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8576153Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8576637Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8576752Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8576946Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8577310Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8577427Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8577649Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8577829Z [rank2]:E1204 13:22:53.323000 470645 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8577868Z dist init r=2, world=4 2025-12-04T13:24:33.8578003Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8578161Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8578449Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8578604Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8578886Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8579009Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8579284Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8579431Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8579756Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8579903Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8580178Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8580313Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8580607Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8580756Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8581229Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8581344Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8581551Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8581906Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8582044Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8582255Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8582418Z [rank0]:E1204 13:22:53.332000 470643 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8582457Z dist init r=0, world=4 2025-12-04T13:24:33.8582793Z [rank0]:[W1204 13:22:53.143214089 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8582834Z FAILED [11.5186s] [ 33%] 2025-12-04T13:24:33.8582837Z 2025-12-04T13:24:33.8582892Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8582993Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____ 2025-12-04T13:24:33.8583040Z Traceback (most recent call last): 2025-12-04T13:24:33.8583202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8583246Z self._join_processes(fn) 2025-12-04T13:24:33.8583418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8583472Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8583649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8583694Z raise RuntimeError(error) 2025-12-04T13:24:33.8583774Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8583819Z Traceback (most recent call last): 2025-12-04T13:24:33.8583980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8584023Z getattr(self, test_name)() 2025-12-04T13:24:33.8584179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8584214Z fn() 2025-12-04T13:24:33.8584364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8584405Z method(*args, **kwargs) 2025-12-04T13:24:33.8584566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8584607Z method(*args, **kwargs) 2025-12-04T13:24:33.8584757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8584795Z with policy(): 2025-12-04T13:24:33.8584946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8584986Z raise RuntimeError(msg) 2025-12-04T13:24:33.8585336Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8585338Z 2025-12-04T13:24:33.8585423Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8585655Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8585668Z 2025-12-04T13:24:33.8585756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8585769Z 2025-12-04T13:24:33.8585771Z 2025-12-04T13:24:33.8585845Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8585934Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8586166Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-dd025d88dd91e0ff.xml - 2025-12-04T13:24:33.8586226Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8586473Z FAILED [11.5186s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8586521Z Traceback (most recent call last): 2025-12-04T13:24:33.8586684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8586728Z getattr(self, test_name)() 2025-12-04T13:24:33.8586890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8586927Z fn() 2025-12-04T13:24:33.8587076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8587117Z method(*args, **kwargs) 2025-12-04T13:24:33.8587267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8587309Z method(*args, **kwargs) 2025-12-04T13:24:33.8587458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8587498Z with policy(): 2025-12-04T13:24:33.8587651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8587693Z raise RuntimeError(msg) 2025-12-04T13:24:33.8588043Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8588045Z 2025-12-04T13:24:33.8588118Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8588348Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8588350Z 2025-12-04T13:24:33.8588446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8588510Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8588573Z ====================== 1 failed, 18 deselected in 11.66s ======================= 2025-12-04T13:24:33.8588612Z Got exit code 1 2025-12-04T13:24:33.8588652Z Retrying single test... 2025-12-04T13:24:33.8588843Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0c0c0fd4cda95cbd.xml 2025-12-04T13:24:33.8588898Z ============================= test session starts ============================== 2025-12-04T13:24:33.8589011Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8589052Z cachedir: .pytest_cache 2025-12-04T13:24:33.8589221Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8589270Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8589322Z configfile: pytest.ini 2025-12-04T13:24:33.8589483Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8589572Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8589839Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8589883Z Running 1 items in this shard 2025-12-04T13:24:33.8589885Z 2025-12-04T13:24:33.8590194Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 13:22:57.841000 470976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 471045 2025-12-04T13:24:33.8590348Z I1204 13:22:57.842000 470976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 471046 2025-12-04T13:24:33.8590501Z I1204 13:22:57.842000 470976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 471047 2025-12-04T13:24:33.8590650Z I1204 13:22:57.843000 470976 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 471048 2025-12-04T13:24:33.8591007Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8591055Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8591407Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8591455Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8591805Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8591852Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8592199Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8592244Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8592840Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8592881Z _warn_cpu_init() 2025-12-04T13:24:33.8593171Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8593214Z return func(*args, **kwargs) 2025-12-04T13:24:33.8593792Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8593851Z _warn_cpu_init() 2025-12-04T13:24:33.8594426Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8594464Z _warn_cpu_init() 2025-12-04T13:24:33.8595028Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8595068Z _warn_cpu_init() 2025-12-04T13:24:33.8595210Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8595373Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8595661Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8595817Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8596103Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8596229Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8596509Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8596656Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8596944Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8597092Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8597372Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8597508Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8597785Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8597943Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8598421Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8598559Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8598752Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8599111Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8599226Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8599437Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8599603Z [rank2]:E1204 13:23:07.472000 471047 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8599642Z dist init r=2, world=4 2025-12-04T13:24:33.8599826Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8599984Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8600273Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8600427Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8600712Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8600836Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8601112Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8601272Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8601547Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8601694Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8601972Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8602122Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8602400Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8602563Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8603050Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8603163Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8603361Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8603717Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8603831Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8604041Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8604206Z [rank0]:E1204 13:23:07.476000 471045 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8604346Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8604505Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8604792Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8604945Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8605229Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8605354Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8605639Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8605788Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8606062Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8606207Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8606494Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8606642Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8606920Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8607077Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8607550Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8607664Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8607860Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8608216Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8608329Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8608541Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8608704Z [rank1]:E1204 13:23:07.476000 471046 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8608745Z dist init r=0, world=4 2025-12-04T13:24:33.8608784Z dist init r=1, world=4 2025-12-04T13:24:33.8608923Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8609081Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8609366Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8609518Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8609861Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8609984Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8610260Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8610406Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8610702Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8610850Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8611141Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8611290Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8611567Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8611714Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8612186Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8612301Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8612496Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8612850Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8612964Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8613175Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8613340Z [rank3]:E1204 13:23:07.478000 471048 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8613379Z dist init r=3, world=4 2025-12-04T13:24:33.8613714Z [rank0]:[W1204 13:23:07.191386406 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8613755Z FAILED [11.6171s] [100%] 2025-12-04T13:24:33.8613757Z 2025-12-04T13:24:33.8613813Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8613924Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____ 2025-12-04T13:24:33.8613971Z Traceback (most recent call last): 2025-12-04T13:24:33.8614133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8614177Z self._join_processes(fn) 2025-12-04T13:24:33.8614349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8614402Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8614579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8614622Z raise RuntimeError(error) 2025-12-04T13:24:33.8614712Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8614757Z Traceback (most recent call last): 2025-12-04T13:24:33.8614919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8614973Z getattr(self, test_name)() 2025-12-04T13:24:33.8615141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8615176Z fn() 2025-12-04T13:24:33.8615327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8615368Z method(*args, **kwargs) 2025-12-04T13:24:33.8615517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8615557Z method(*args, **kwargs) 2025-12-04T13:24:33.8615708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8615747Z with policy(): 2025-12-04T13:24:33.8615898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8615942Z raise RuntimeError(msg) 2025-12-04T13:24:33.8616294Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8616296Z 2025-12-04T13:24:33.8616372Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8616600Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8616603Z 2025-12-04T13:24:33.8616692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8616695Z 2025-12-04T13:24:33.8616697Z 2025-12-04T13:24:33.8616772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8616860Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8617094Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0c0c0fd4cda95cbd.xml - 2025-12-04T13:24:33.8617153Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8617400Z FAILED [11.6171s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8617446Z Traceback (most recent call last): 2025-12-04T13:24:33.8617612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8617665Z getattr(self, test_name)() 2025-12-04T13:24:33.8617826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8617861Z fn() 2025-12-04T13:24:33.8618014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8618054Z method(*args, **kwargs) 2025-12-04T13:24:33.8618206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8618245Z method(*args, **kwargs) 2025-12-04T13:24:33.8618395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8618432Z with policy(): 2025-12-04T13:24:33.8618592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8618634Z raise RuntimeError(msg) 2025-12-04T13:24:33.8618997Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8619014Z 2025-12-04T13:24:33.8619088Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8619315Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8619317Z 2025-12-04T13:24:33.8619404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8619467Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8619531Z ====================== 1 failed, 20 deselected in 11.75s ======================= 2025-12-04T13:24:33.8619569Z Got exit code 1 2025-12-04T13:24:33.8619610Z Retrying single test... 2025-12-04T13:24:33.8619849Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-976919b04418c9ea.xml 2025-12-04T13:24:33.8619908Z ============================= test session starts ============================== 2025-12-04T13:24:33.8620019Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8620061Z cachedir: .pytest_cache 2025-12-04T13:24:33.8620217Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8620264Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8620305Z configfile: pytest.ini 2025-12-04T13:24:33.8620469Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8620544Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8620767Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8620812Z Running 1 items in this shard 2025-12-04T13:24:33.8620814Z 2025-12-04T13:24:33.8621117Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 13:23:11.959000 471378 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 471447 2025-12-04T13:24:33.8621271Z I1204 13:23:11.960000 471378 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 471448 2025-12-04T13:24:33.8621422Z I1204 13:23:11.960000 471378 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 471449 2025-12-04T13:24:33.8621592Z I1204 13:23:11.961000 471378 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 471450 2025-12-04T13:24:33.8621951Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8622001Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8622356Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8622403Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8625029Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8625099Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8625454Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8625514Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8626094Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8626135Z _warn_cpu_init() 2025-12-04T13:24:33.8626698Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8626737Z _warn_cpu_init() 2025-12-04T13:24:33.8627302Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8627342Z _warn_cpu_init() 2025-12-04T13:24:33.8627903Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T13:24:33.8627941Z _warn_cpu_init() 2025-12-04T13:24:33.8628237Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8628293Z return func(*args, **kwargs) 2025-12-04T13:24:33.8628439Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8628602Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8628893Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8629048Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8629344Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8629472Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8629823Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8629991Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8630267Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8630415Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8630690Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8630828Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8631108Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8631255Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8631739Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 114176 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552. 2025-12-04T13:24:33.8631856Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8632055Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8632419Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8632536Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8632763Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8632929Z [rank3]:E1204 13:23:21.297000 471450 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8632971Z dist init r=3, world=4 2025-12-04T13:24:33.8633109Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8633268Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8633554Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8633722Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8634018Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8634156Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8634434Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8634582Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8634861Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8635008Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8635284Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8635419Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8635696Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8635846Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8636323Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8636439Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8636633Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8637004Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8637118Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8637330Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8637494Z [rank1]:E1204 13:23:21.301000 471448 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8637534Z dist init r=1, world=4 2025-12-04T13:24:33.8637671Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8637840Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8638127Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8638293Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8638594Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8638718Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8638995Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8639143Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8639420Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8639567Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8639879Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8640017Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8640293Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8640442Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8640917Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8641030Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8641243Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8641603Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8641721Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8641930Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8642094Z [rank0]:E1204 13:23:21.303000 471447 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8642149Z dist init r=0, world=4 2025-12-04T13:24:33.8642288Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8642462Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8642762Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8642915Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8643197Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8643322Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8643597Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8643747Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8644024Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8644169Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8644445Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8644582Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8644862Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8645009Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8645495Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200. 2025-12-04T13:24:33.8645610Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8645804Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8646163Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8646276Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8646499Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8646675Z [rank2]:E1204 13:23:21.311000 471449 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8646714Z dist init r=2, world=4 2025-12-04T13:24:33.8647060Z [rank0]:[W1204 13:23:21.995820789 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8647102Z FAILED [11.3169s] [100%] 2025-12-04T13:24:33.8647104Z 2025-12-04T13:24:33.8647162Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8647262Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____ 2025-12-04T13:24:33.8647309Z Traceback (most recent call last): 2025-12-04T13:24:33.8647475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8647520Z self._join_processes(fn) 2025-12-04T13:24:33.8647692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8647746Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8647923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8647967Z raise RuntimeError(error) 2025-12-04T13:24:33.8648046Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8648093Z Traceback (most recent call last): 2025-12-04T13:24:33.8648252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8648296Z getattr(self, test_name)() 2025-12-04T13:24:33.8648454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8648491Z fn() 2025-12-04T13:24:33.8648641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8648684Z method(*args, **kwargs) 2025-12-04T13:24:33.8648833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8648874Z method(*args, **kwargs) 2025-12-04T13:24:33.8649024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8649061Z with policy(): 2025-12-04T13:24:33.8649212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8649254Z raise RuntimeError(msg) 2025-12-04T13:24:33.8649614Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8649619Z 2025-12-04T13:24:33.8649729Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8649961Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8649964Z 2025-12-04T13:24:33.8650051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8650053Z 2025-12-04T13:24:33.8650114Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8650159Z Traceback (most recent call last): 2025-12-04T13:24:33.8650340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8650397Z getattr(self, test_name)() 2025-12-04T13:24:33.8650555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8650607Z fn() 2025-12-04T13:24:33.8650757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8650798Z method(*args, **kwargs) 2025-12-04T13:24:33.8650946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8650987Z method(*args, **kwargs) 2025-12-04T13:24:33.8651134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8651172Z with policy(): 2025-12-04T13:24:33.8651323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8651366Z raise RuntimeError(msg) 2025-12-04T13:24:33.8651711Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8651715Z 2025-12-04T13:24:33.8651790Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8652016Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8652019Z 2025-12-04T13:24:33.8652106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8652109Z 2025-12-04T13:24:33.8652112Z 2025-12-04T13:24:33.8652189Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8652279Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8652510Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-976919b04418c9ea.xml - 2025-12-04T13:24:33.8652572Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8652819Z FAILED [11.3169s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8652865Z Traceback (most recent call last): 2025-12-04T13:24:33.8653029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8653071Z getattr(self, test_name)() 2025-12-04T13:24:33.8653244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8653280Z fn() 2025-12-04T13:24:33.8653430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8653471Z method(*args, **kwargs) 2025-12-04T13:24:33.8653620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8653659Z method(*args, **kwargs) 2025-12-04T13:24:33.8653807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8653844Z with policy(): 2025-12-04T13:24:33.8653995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8654047Z raise RuntimeError(msg) 2025-12-04T13:24:33.8654397Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296. 2025-12-04T13:24:33.8654425Z 2025-12-04T13:24:33.8654499Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8654725Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8654727Z 2025-12-04T13:24:33.8654814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8654816Z 2025-12-04T13:24:33.8654874Z Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8654920Z Traceback (most recent call last): 2025-12-04T13:24:33.8655081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8655125Z getattr(self, test_name)() 2025-12-04T13:24:33.8655283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8655320Z fn() 2025-12-04T13:24:33.8655471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8655512Z method(*args, **kwargs) 2025-12-04T13:24:33.8655660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8655701Z method(*args, **kwargs) 2025-12-04T13:24:33.8655849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8655886Z with policy(): 2025-12-04T13:24:33.8656038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8656080Z raise RuntimeError(msg) 2025-12-04T13:24:33.8656427Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416. 2025-12-04T13:24:33.8656430Z 2025-12-04T13:24:33.8656503Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8656728Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8656730Z 2025-12-04T13:24:33.8656815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8656879Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8656952Z ====================== 1 failed, 20 deselected in 11.47s ======================= 2025-12-04T13:24:33.8656993Z Got exit code 1 2025-12-04T13:24:33.8657168Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda 2025-12-04T13:24:33.8657297Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8657484Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2182f2be15244c13.xml 2025-12-04T13:24:33.8657543Z ============================= test session starts ============================== 2025-12-04T13:24:33.8657657Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8657698Z cachedir: .pytest_cache 2025-12-04T13:24:33.8657868Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8657915Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8657968Z configfile: pytest.ini 2025-12-04T13:24:33.8658129Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8658215Z collecting ... collected 60 items / 19 deselected / 41 selected 2025-12-04T13:24:33.8658268Z stepcurrent: skipping 19 already run items. 2025-12-04T13:24:33.8658313Z Running 2 items in this shard 2025-12-04T13:24:33.8658315Z 2025-12-04T13:24:33.8658612Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 13:23:25.559000 471780 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 471849 2025-12-04T13:24:33.8658768Z I1204 13:23:25.560000 471780 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 471850 2025-12-04T13:24:33.8658920Z I1204 13:23:25.560000 471780 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 471851 2025-12-04T13:24:33.8659071Z I1204 13:23:25.561000 471780 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 471852 2025-12-04T13:24:33.8659431Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8659480Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8660031Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8660094Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8660450Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8660498Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8660989Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8661051Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8661417Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8661466Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8661948Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8662008Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8662371Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8662432Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8662916Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8662988Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8663133Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8663296Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8663586Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8663742Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8664029Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8664156Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8664436Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8664585Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8664866Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8665013Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8665289Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8665437Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8665713Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8665862Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8666332Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3401580544. 2025-12-04T13:24:33.8666458Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8666656Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8667015Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8667142Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8667355Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8667518Z [rank2]:E1204 13:23:32.856000 471851 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8667559Z dist init r=2, world=4 2025-12-04T13:24:33.8667697Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8667857Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8668142Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8668296Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8668580Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8668705Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8668982Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8669130Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8669411Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8669560Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8669891Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8670027Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8670305Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8670452Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8670940Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 3351248896. 2025-12-04T13:24:33.8671068Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8671277Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8671623Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8671736Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8671950Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8672115Z [rank3]:E1204 13:23:32.863000 471852 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8672155Z dist init r=3, world=4 2025-12-04T13:24:33.8672293Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8672453Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8672738Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8672892Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8673175Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8673300Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8673576Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8673724Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8674011Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8674160Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8674436Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8674571Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8674848Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8675006Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8675480Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3554672640. 2025-12-04T13:24:33.8675606Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8675801Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8676148Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8676263Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8676474Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8676639Z [rank0]:E1204 13:23:32.912000 471849 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8676678Z dist init r=0, world=4 2025-12-04T13:24:33.8676816Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8676975Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8677261Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8677415Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8677697Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8677821Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8678097Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8678255Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8678532Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8678678Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8678955Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8679103Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8679382Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8679540Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8680053Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3418357760. 2025-12-04T13:24:33.8680168Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8680364Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8680710Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8680823Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8681034Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8681198Z [rank1]:E1204 13:23:32.914000 471850 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8681238Z dist init r=1, world=4 2025-12-04T13:24:33.8681278Z FAILED [8.6141s] [ 50%] 2025-12-04T13:24:33.8681280Z 2025-12-04T13:24:33.8681339Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8681437Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______ 2025-12-04T13:24:33.8681483Z Traceback (most recent call last): 2025-12-04T13:24:33.8681648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8681691Z self._join_processes(fn) 2025-12-04T13:24:33.8681864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8681918Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8682095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8682141Z raise RuntimeError(error) 2025-12-04T13:24:33.8682236Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8682281Z Traceback (most recent call last): 2025-12-04T13:24:33.8682444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8682488Z getattr(self, test_name)() 2025-12-04T13:24:33.8682646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8682681Z fn() 2025-12-04T13:24:33.8682832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8682873Z method(*args, **kwargs) 2025-12-04T13:24:33.8683023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8683063Z method(*args, **kwargs) 2025-12-04T13:24:33.8683227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8683279Z with policy(): 2025-12-04T13:24:33.8683431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8683486Z raise RuntimeError(msg) 2025-12-04T13:24:33.8683827Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3401580544. 2025-12-04T13:24:33.8683829Z 2025-12-04T13:24:33.8683905Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8684125Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8684127Z 2025-12-04T13:24:33.8684216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8684219Z 2025-12-04T13:24:33.8684278Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8684324Z Traceback (most recent call last): 2025-12-04T13:24:33.8684486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8684530Z getattr(self, test_name)() 2025-12-04T13:24:33.8684687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8684724Z fn() 2025-12-04T13:24:33.8684872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8684914Z method(*args, **kwargs) 2025-12-04T13:24:33.8685064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8685105Z method(*args, **kwargs) 2025-12-04T13:24:33.8685255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8685293Z with policy(): 2025-12-04T13:24:33.8685446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8685486Z raise RuntimeError(msg) 2025-12-04T13:24:33.8685828Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 3351248896. 2025-12-04T13:24:33.8685830Z 2025-12-04T13:24:33.8685904Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8686146Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8686150Z 2025-12-04T13:24:33.8686236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8686239Z 2025-12-04T13:24:33.8686242Z 2025-12-04T13:24:33.8686318Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8686404Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8686636Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2182f2be15244c13.xml - 2025-12-04T13:24:33.8686696Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8686944Z FAILED [8.6141s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T13:24:33.8686993Z Traceback (most recent call last): 2025-12-04T13:24:33.8687166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8687221Z getattr(self, test_name)() 2025-12-04T13:24:33.8687378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8687413Z fn() 2025-12-04T13:24:33.8687563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8687603Z method(*args, **kwargs) 2025-12-04T13:24:33.8687751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8687791Z method(*args, **kwargs) 2025-12-04T13:24:33.8687944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8687983Z with policy(): 2025-12-04T13:24:33.8688134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8688176Z raise RuntimeError(msg) 2025-12-04T13:24:33.8688514Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3401580544. 2025-12-04T13:24:33.8688517Z 2025-12-04T13:24:33.8688590Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8688808Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8688811Z 2025-12-04T13:24:33.8688897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8688900Z 2025-12-04T13:24:33.8688958Z Process 3 exited with error code 10 and exception: 2025-12-04T13:24:33.8689003Z Traceback (most recent call last): 2025-12-04T13:24:33.8689163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8689206Z getattr(self, test_name)() 2025-12-04T13:24:33.8689362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8689396Z fn() 2025-12-04T13:24:33.8689545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8689584Z method(*args, **kwargs) 2025-12-04T13:24:33.8689774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8689833Z method(*args, **kwargs) 2025-12-04T13:24:33.8689983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8690021Z with policy(): 2025-12-04T13:24:33.8690172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8690213Z raise RuntimeError(msg) 2025-12-04T13:24:33.8690550Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 3351248896. 2025-12-04T13:24:33.8690552Z 2025-12-04T13:24:33.8690625Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8690857Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8690873Z 2025-12-04T13:24:33.8690959Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8691021Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8691097Z ======================= 1 failed, 19 deselected in 8.78s ======================= 2025-12-04T13:24:33.8691135Z Got exit code 1 2025-12-04T13:24:33.8691176Z Retrying single test... 2025-12-04T13:24:33.8691363Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-59e47faf9a4f639f.xml 2025-12-04T13:24:33.8691422Z ============================= test session starts ============================== 2025-12-04T13:24:33.8691532Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8691575Z cachedir: .pytest_cache 2025-12-04T13:24:33.8691734Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8691782Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8691824Z configfile: pytest.ini 2025-12-04T13:24:33.8691988Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8692063Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8692277Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8692321Z Running 1 items in this shard 2025-12-04T13:24:33.8692324Z 2025-12-04T13:24:33.8692621Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 13:23:36.575000 472174 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 472243 2025-12-04T13:24:33.8692776Z I1204 13:23:36.576000 472174 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 472244 2025-12-04T13:24:33.8692927Z I1204 13:23:36.576000 472174 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 472245 2025-12-04T13:24:33.8693077Z I1204 13:23:36.577000 472174 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 472246 2025-12-04T13:24:33.8693432Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8693482Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8693985Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8694048Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8694402Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8694449Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8694949Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8695020Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8695370Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8695429Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8695912Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8695972Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8696322Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8696370Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8696851Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8696909Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8697055Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8697218Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8697512Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8697666Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8697951Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8698086Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8698366Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8698516Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8698791Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8698938Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8699225Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8699372Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8699668Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8699864Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8700335Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3554672640. 2025-12-04T13:24:33.8700451Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8700648Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8700995Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8701110Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8701322Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8701487Z [rank0]:E1204 13:23:43.781000 472243 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8701527Z dist init r=0, world=4 2025-12-04T13:24:33.8701667Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8701828Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8702116Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8702270Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8702569Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8702694Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8702972Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8703119Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8703408Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8703555Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8703846Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8703995Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8704274Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8704422Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8704888Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3401580544. 2025-12-04T13:24:33.8705003Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8705198Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8705545Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8705659Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8705870Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8706034Z [rank2]:E1204 13:23:43.787000 472245 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8706074Z dist init r=2, world=4 2025-12-04T13:24:33.8706211Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8706371Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8706674Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8706827Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8707111Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8707233Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8707509Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8707666Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8707941Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8708109Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8708384Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8708521Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8708799Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8708947Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8709412Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 3351248896. 2025-12-04T13:24:33.8709526Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8709762Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8710107Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8710224Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8710432Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8710595Z [rank3]:E1204 13:23:43.840000 472246 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8710634Z dist init r=3, world=4 2025-12-04T13:24:33.8710774Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8710946Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8711233Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8711388Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8711670Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8711807Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8712084Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8712243Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8712534Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8712680Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8712956Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8713092Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8713371Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8713519Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8713983Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3418357760. 2025-12-04T13:24:33.8714097Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8714293Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8714638Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8714750Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8714960Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8715133Z [rank1]:E1204 13:23:43.841000 472244 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8715173Z dist init r=1, world=4 2025-12-04T13:24:33.8715212Z FAILED [8.4158s] [100%] 2025-12-04T13:24:33.8715214Z 2025-12-04T13:24:33.8715270Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8715367Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______ 2025-12-04T13:24:33.8715414Z Traceback (most recent call last): 2025-12-04T13:24:33.8715575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8715620Z self._join_processes(fn) 2025-12-04T13:24:33.8715792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8715847Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8716035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8716094Z raise RuntimeError(error) 2025-12-04T13:24:33.8716174Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8716232Z Traceback (most recent call last): 2025-12-04T13:24:33.8716391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8716433Z getattr(self, test_name)() 2025-12-04T13:24:33.8716590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8716624Z fn() 2025-12-04T13:24:33.8716776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8716816Z method(*args, **kwargs) 2025-12-04T13:24:33.8716967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8717008Z method(*args, **kwargs) 2025-12-04T13:24:33.8717157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8717196Z with policy(): 2025-12-04T13:24:33.8717348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8717389Z raise RuntimeError(msg) 2025-12-04T13:24:33.8717732Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3554672640. 2025-12-04T13:24:33.8717734Z 2025-12-04T13:24:33.8717809Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8718029Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8718032Z 2025-12-04T13:24:33.8718120Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8718123Z 2025-12-04T13:24:33.8718125Z 2025-12-04T13:24:33.8718198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8718286Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8718516Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-59e47faf9a4f639f.xml - 2025-12-04T13:24:33.8718577Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8718826Z FAILED [8.4158s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8718875Z Traceback (most recent call last): 2025-12-04T13:24:33.8719038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8719082Z getattr(self, test_name)() 2025-12-04T13:24:33.8719240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8719275Z fn() 2025-12-04T13:24:33.8719424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8719465Z method(*args, **kwargs) 2025-12-04T13:24:33.8719614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8719665Z method(*args, **kwargs) 2025-12-04T13:24:33.8719848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8719903Z with policy(): 2025-12-04T13:24:33.8720054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8720110Z raise RuntimeError(msg) 2025-12-04T13:24:33.8720449Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3554672640. 2025-12-04T13:24:33.8720451Z 2025-12-04T13:24:33.8720524Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8720745Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8720747Z 2025-12-04T13:24:33.8720834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8720898Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8720960Z ======================= 1 failed, 20 deselected in 8.57s ======================= 2025-12-04T13:24:33.8721000Z Got exit code 1 2025-12-04T13:24:33.8721040Z Retrying single test... 2025-12-04T13:24:33.8721229Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1f9fd4c59a9bc9e4.xml 2025-12-04T13:24:33.8721285Z ============================= test session starts ============================== 2025-12-04T13:24:33.8721397Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8721439Z cachedir: .pytest_cache 2025-12-04T13:24:33.8721596Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8721644Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8721685Z configfile: pytest.ini 2025-12-04T13:24:33.8721845Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8721920Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8722137Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8722180Z Running 1 items in this shard 2025-12-04T13:24:33.8722182Z 2025-12-04T13:24:33.8722477Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 13:23:47.724000 472568 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 472637 2025-12-04T13:24:33.8722647Z I1204 13:23:47.725000 472568 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 472638 2025-12-04T13:24:33.8722799Z I1204 13:23:47.726000 472568 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 472639 2025-12-04T13:24:33.8722949Z I1204 13:23:47.726000 472568 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 472640 2025-12-04T13:24:33.8723308Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8723359Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8723862Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8723936Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8724298Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8724346Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8724698Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8724745Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8725232Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8725294Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8725776Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8725835Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8726187Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T13:24:33.8726234Z self.encoder = TransformerEncoder( 2025-12-04T13:24:33.8726720Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8726778Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8726922Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8727097Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8727386Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8727540Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8727823Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8727958Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8728236Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8728394Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8728681Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8728827Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8729104Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8729241Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8729520Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8729668Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8730174Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 3351248896. 2025-12-04T13:24:33.8730291Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8730487Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8730837Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8730952Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8731163Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8731341Z [rank3]:E1204 13:23:55.017000 472640 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8731382Z dist init r=3, world=4 2025-12-04T13:24:33.8731521Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8731680Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8731967Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8732120Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8732425Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8732563Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8732854Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8733001Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8733276Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8733424Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8733701Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8733840Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8734118Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8734265Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8734732Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3401580544. 2025-12-04T13:24:33.8734848Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8735044Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8735388Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8735501Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8735724Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8735890Z [rank2]:E1204 13:23:55.022000 472639 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8735930Z dist init r=2, world=4 2025-12-04T13:24:33.8736067Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8736228Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8736526Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8736681Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8736979Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8737125Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8737400Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8737548Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8737825Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8737972Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8738250Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8738385Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8738666Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8738814Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8739280Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3554672640. 2025-12-04T13:24:33.8739395Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8739589Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8740000Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8740114Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8740326Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8740488Z [rank0]:E1204 13:23:55.042000 472637 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8740528Z dist init r=0, world=4 2025-12-04T13:24:33.8740665Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8740845Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8741132Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8741312Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8741598Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8741720Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8741997Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8742144Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8742421Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8742567Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8742840Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8742977Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8743260Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8743410Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8743873Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3418357760. 2025-12-04T13:24:33.8743989Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8744194Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8744541Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8744654Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8744863Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8745041Z [rank1]:E1204 13:23:55.063000 472638 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8745079Z dist init r=1, world=4 2025-12-04T13:24:33.8745120Z FAILED [8.6140s] [100%] 2025-12-04T13:24:33.8745132Z 2025-12-04T13:24:33.8745187Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8745283Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______ 2025-12-04T13:24:33.8745340Z Traceback (most recent call last): 2025-12-04T13:24:33.8745502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8745547Z self._join_processes(fn) 2025-12-04T13:24:33.8745719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8745773Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8745952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8745998Z raise RuntimeError(error) 2025-12-04T13:24:33.8746077Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8746124Z Traceback (most recent call last): 2025-12-04T13:24:33.8746283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8746328Z getattr(self, test_name)() 2025-12-04T13:24:33.8746484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8746519Z fn() 2025-12-04T13:24:33.8746670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8746712Z method(*args, **kwargs) 2025-12-04T13:24:33.8746862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8746903Z method(*args, **kwargs) 2025-12-04T13:24:33.8747053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8747092Z with policy(): 2025-12-04T13:24:33.8747244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8747287Z raise RuntimeError(msg) 2025-12-04T13:24:33.8747626Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3554672640. 2025-12-04T13:24:33.8747629Z 2025-12-04T13:24:33.8747703Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8747944Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8747948Z 2025-12-04T13:24:33.8748034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8748036Z 2025-12-04T13:24:33.8748038Z 2025-12-04T13:24:33.8748113Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8748200Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8748431Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1f9fd4c59a9bc9e4.xml - 2025-12-04T13:24:33.8748490Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8748738Z FAILED [8.6140s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8748785Z Traceback (most recent call last): 2025-12-04T13:24:33.8748949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8749004Z getattr(self, test_name)() 2025-12-04T13:24:33.8749172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8749207Z fn() 2025-12-04T13:24:33.8749358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8749399Z method(*args, **kwargs) 2025-12-04T13:24:33.8749548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8749588Z method(*args, **kwargs) 2025-12-04T13:24:33.8749782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8749822Z with policy(): 2025-12-04T13:24:33.8749973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8750015Z raise RuntimeError(msg) 2025-12-04T13:24:33.8750355Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3554672640. 2025-12-04T13:24:33.8750357Z 2025-12-04T13:24:33.8750432Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8750651Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8750654Z 2025-12-04T13:24:33.8750740Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8750803Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8750865Z ======================= 1 failed, 20 deselected in 8.77s ======================= 2025-12-04T13:24:33.8750903Z Got exit code 1 2025-12-04T13:24:33.8751072Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T13:24:33.8751200Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8751387Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d89b04705e944c3.xml 2025-12-04T13:24:33.8751443Z ============================= test session starts ============================== 2025-12-04T13:24:33.8751554Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8751598Z cachedir: .pytest_cache 2025-12-04T13:24:33.8751777Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8751825Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8751865Z configfile: pytest.ini 2025-12-04T13:24:33.8752026Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8752100Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8752154Z stepcurrent: skipping 20 already run items. 2025-12-04T13:24:33.8752197Z Running 1 items in this shard 2025-12-04T13:24:33.8752199Z 2025-12-04T13:24:33.8752480Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 13:23:58.635000 472962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 473031 2025-12-04T13:24:33.8752651Z I1204 13:23:58.636000 472962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 473032 2025-12-04T13:24:33.8752817Z I1204 13:23:58.636000 472962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 473033 2025-12-04T13:24:33.8752967Z I1204 13:23:58.637000 472962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 473034 2025-12-04T13:24:33.8753485Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8753547Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8754035Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8754099Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8754586Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8754643Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8755129Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8755187Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8755477Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8755521Z return func(*args, **kwargs) 2025-12-04T13:24:33.8755665Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8755827Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8756129Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8756285Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8756570Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8756695Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8756983Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8757131Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8757418Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8757576Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8757851Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8757986Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8758264Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8758412Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8758865Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.8758982Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8759790Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8760607Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8760816Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8761162Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8761423Z [rank2]:E1204 13:24:05.975000 473033 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8761494Z dist init r=2, world=4 2025-12-04T13:24:33.8762112Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8762372Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8762839Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8763085Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8763600Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8763800Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8764308Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8764587Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8765006Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8765654Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8766849Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8767205Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8767858Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8768207Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8769226Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.8769499Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8770013Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8770759Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8771017Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8771498Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8772455Z [rank1]:E1204 13:24:05.977000 473032 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8772551Z dist init r=1, world=4 2025-12-04T13:24:33.8772871Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8773230Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8773808Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8774154Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8774682Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8774985Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8775553Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8775824Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8776332Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8776602Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8777102Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8777356Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8777866Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8778132Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8778950Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.8779163Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8779523Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8780188Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8780419Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8780808Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8781106Z [rank0]:E1204 13:24:05.985000 473031 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8781186Z dist init r=0, world=4 2025-12-04T13:24:33.8781438Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8781729Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8782273Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8782578Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8783094Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8783349Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8783857Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8784107Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8784482Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8784682Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8785052Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8785240Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8785612Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8785816Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8786418Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.8786574Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8786836Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8787295Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8787451Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8787736Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8787961Z [rank3]:E1204 13:24:06.017000 473034 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8788013Z dist init r=3, world=4 2025-12-04T13:24:33.8788493Z [rank0]:[W1204 13:24:06.704850287 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8788570Z FAILED [9.2147s] [100%] 2025-12-04T13:24:33.8788575Z 2025-12-04T13:24:33.8788659Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8788795Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________ 2025-12-04T13:24:33.8788869Z Traceback (most recent call last): 2025-12-04T13:24:33.8789090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8789153Z self._join_processes(fn) 2025-12-04T13:24:33.8789392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8789466Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8789760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8789823Z raise RuntimeError(error) 2025-12-04T13:24:33.8789941Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8790007Z Traceback (most recent call last): 2025-12-04T13:24:33.8790225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8790282Z getattr(self, test_name)() 2025-12-04T13:24:33.8790505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8790556Z fn() 2025-12-04T13:24:33.8790765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8790822Z method(*args, **kwargs) 2025-12-04T13:24:33.8791028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8791082Z method(*args, **kwargs) 2025-12-04T13:24:33.8791288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8791339Z with policy(): 2025-12-04T13:24:33.8791547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8791601Z raise RuntimeError(msg) 2025-12-04T13:24:33.8792037Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.8792043Z 2025-12-04T13:24:33.8792150Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8792448Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8792456Z 2025-12-04T13:24:33.8792579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8792581Z 2025-12-04T13:24:33.8792584Z 2025-12-04T13:24:33.8792693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8792822Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8793138Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d89b04705e944c3.xml - 2025-12-04T13:24:33.8793228Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8793573Z FAILED [9.2147s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T13:24:33.8793643Z Traceback (most recent call last): 2025-12-04T13:24:33.8793872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8793958Z getattr(self, test_name)() 2025-12-04T13:24:33.8794140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8794183Z fn() 2025-12-04T13:24:33.8794347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8794393Z method(*args, **kwargs) 2025-12-04T13:24:33.8794557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8794600Z method(*args, **kwargs) 2025-12-04T13:24:33.8794764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8794810Z with policy(): 2025-12-04T13:24:33.8794976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8795021Z raise RuntimeError(msg) 2025-12-04T13:24:33.8795369Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.8795374Z 2025-12-04T13:24:33.8795453Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8795668Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8795671Z 2025-12-04T13:24:33.8795765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8795839Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8795907Z ======================= 1 failed, 20 deselected in 9.37s ======================= 2025-12-04T13:24:33.8795952Z Got exit code 1 2025-12-04T13:24:33.8795998Z Retrying single test... 2025-12-04T13:24:33.8796206Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-923a8fa6750bc776.xml 2025-12-04T13:24:33.8796266Z ============================= test session starts ============================== 2025-12-04T13:24:33.8796391Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8796437Z cachedir: .pytest_cache 2025-12-04T13:24:33.8796610Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8796663Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8796710Z configfile: pytest.ini 2025-12-04T13:24:33.8796900Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8796984Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8797198Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8797247Z Running 1 items in this shard 2025-12-04T13:24:33.8797249Z 2025-12-04T13:24:33.8797552Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 13:24:10.209000 473364 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 473433 2025-12-04T13:24:33.8797717Z I1204 13:24:10.210000 473364 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 473434 2025-12-04T13:24:33.8797897Z I1204 13:24:10.210000 473364 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 473435 2025-12-04T13:24:33.8798067Z I1204 13:24:10.211000 473364 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 473436 2025-12-04T13:24:33.8798604Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8798685Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8799204Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8799278Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8799827Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8799895Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8800411Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8800472Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8800785Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8800834Z return func(*args, **kwargs) 2025-12-04T13:24:33.8800992Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8801169Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8801497Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8801663Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8801977Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8802115Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8802412Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8802586Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8802882Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8803080Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8803375Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8803519Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8803803Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8803953Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8804406Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.8804524Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8804724Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8805052Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8805170Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8805385Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8805549Z [rank1]:E1204 13:24:17.512000 473434 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8805690Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8805860Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8806153Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8806307Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8806594Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8806718Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8807008Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8807169Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8807457Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8807606Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8807881Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8808022Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8808304Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8808456Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8808906Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.8809021Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8809221Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8809547Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8809665Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8809943Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8810112Z [rank0]:E1204 13:24:17.512000 473433 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8810155Z dist init r=0, world=4 2025-12-04T13:24:33.8810209Z dist init r=1, world=4 2025-12-04T13:24:33.8810352Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8810511Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8810803Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8810956Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8811263Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8811388Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8811683Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8811845Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8812120Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8812271Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8812546Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8812687Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8812966Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8813120Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8813568Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.8813684Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8813882Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8814207Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8814324Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8814549Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8814716Z [rank3]:E1204 13:24:17.565000 473436 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8814758Z dist init r=3, world=4 2025-12-04T13:24:33.8814898Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8815061Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8815349Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8815517Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8815810Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8815949Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8816225Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8816375Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8816655Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8816801Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8817080Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8817215Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8817497Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8817647Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8818096Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.8818214Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8818409Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8818747Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8818861Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8819074Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8819239Z [rank2]:E1204 13:24:17.596000 473435 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8819282Z dist init r=2, world=4 2025-12-04T13:24:33.8819621Z [rank0]:[W1204 13:24:17.184313556 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8819674Z FAILED [9.1154s] [100%] 2025-12-04T13:24:33.8819676Z 2025-12-04T13:24:33.8819790Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8819898Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________ 2025-12-04T13:24:33.8819949Z Traceback (most recent call last): 2025-12-04T13:24:33.8820124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8820172Z self._join_processes(fn) 2025-12-04T13:24:33.8820346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8820404Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8820582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8820630Z raise RuntimeError(error) 2025-12-04T13:24:33.8820711Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8820761Z Traceback (most recent call last): 2025-12-04T13:24:33.8820923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8820970Z getattr(self, test_name)() 2025-12-04T13:24:33.8821127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8821167Z fn() 2025-12-04T13:24:33.8821320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8821365Z method(*args, **kwargs) 2025-12-04T13:24:33.8821515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8821560Z method(*args, **kwargs) 2025-12-04T13:24:33.8821711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8821753Z with policy(): 2025-12-04T13:24:33.8821905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8821952Z raise RuntimeError(msg) 2025-12-04T13:24:33.8822279Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.8822282Z 2025-12-04T13:24:33.8822359Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8822564Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8822567Z 2025-12-04T13:24:33.8822667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8822670Z 2025-12-04T13:24:33.8822674Z 2025-12-04T13:24:33.8822754Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8822843Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8823082Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-923a8fa6750bc776.xml - 2025-12-04T13:24:33.8823143Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8823368Z FAILED [9.1154s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8823418Z Traceback (most recent call last): 2025-12-04T13:24:33.8823596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8823650Z getattr(self, test_name)() 2025-12-04T13:24:33.8823820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8823869Z fn() 2025-12-04T13:24:33.8824020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8824064Z method(*args, **kwargs) 2025-12-04T13:24:33.8824214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8824257Z method(*args, **kwargs) 2025-12-04T13:24:33.8824408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8824450Z with policy(): 2025-12-04T13:24:33.8824605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8824650Z raise RuntimeError(msg) 2025-12-04T13:24:33.8824973Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.8824976Z 2025-12-04T13:24:33.8825055Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8825254Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8825259Z 2025-12-04T13:24:33.8825348Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8825415Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8825479Z ======================= 1 failed, 20 deselected in 9.27s ======================= 2025-12-04T13:24:33.8825519Z Got exit code 1 2025-12-04T13:24:33.8825561Z Retrying single test... 2025-12-04T13:24:33.8825751Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-154bf3dbaf892ee0.xml 2025-12-04T13:24:33.8825809Z ============================= test session starts ============================== 2025-12-04T13:24:33.8825926Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8825968Z cachedir: .pytest_cache 2025-12-04T13:24:33.8826128Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8826174Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8826218Z configfile: pytest.ini 2025-12-04T13:24:33.8826381Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8826479Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T13:24:33.8826677Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8826725Z Running 1 items in this shard 2025-12-04T13:24:33.8826728Z 2025-12-04T13:24:33.8827009Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 13:24:21.804000 473766 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 473835 2025-12-04T13:24:33.8827168Z I1204 13:24:21.805000 473766 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 473836 2025-12-04T13:24:33.8827323Z I1204 13:24:21.805000 473766 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 473837 2025-12-04T13:24:33.8827486Z I1204 13:24:21.806000 473766 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 473838 2025-12-04T13:24:33.8827992Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8828064Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8828550Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8828613Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8829097Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8829159Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8829642Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T13:24:33.8829751Z device_from_device_id = _get_device_from_device_id( 2025-12-04T13:24:33.8830043Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T13:24:33.8830092Z return func(*args, **kwargs) 2025-12-04T13:24:33.8830236Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8830401Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8830693Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8830863Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8831152Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8831279Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8831559Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8831710Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8832004Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8832169Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8832456Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8832596Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8832874Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8833028Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8833475Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.8833595Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8833793Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8834125Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8834245Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8834456Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8834624Z [rank0]:E1204 13:24:29.143000 473835 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T13:24:33.8834664Z dist init r=0, world=4 2025-12-04T13:24:33.8834807Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8834967Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8835265Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8835425Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8835708Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8835837Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8836122Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8836274Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8836561Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8836720Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8836998Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8837135Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8837415Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8837563Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8838013Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336. 2025-12-04T13:24:33.8838128Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8838327Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8838659Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8838773Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8838987Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8839151Z [rank2]:E1204 13:24:29.144000 473837 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T13:24:33.8839193Z dist init r=2, world=4 2025-12-04T13:24:33.8839366Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8839530Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8839875Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8840033Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8840319Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8840463Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8840743Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8840921Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8841202Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8841349Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8841629Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8841769Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8842048Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8842198Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8842644Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688. 2025-12-04T13:24:33.8842763Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8842959Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8843287Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8843406Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8843618Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8843801Z [rank3]:E1204 13:24:29.154000 473838 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T13:24:33.8843842Z dist init r=3, world=4 2025-12-04T13:24:33.8843982Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T13:24:33.8844143Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T13:24:33.8844430Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8844583Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T13:24:33.8844884Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8845032Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T13:24:33.8845322Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8845473Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8845752Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8845904Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T13:24:33.8846179Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8846320Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T13:24:33.8846600Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8846748Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T13:24:33.8847196Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552. 2025-12-04T13:24:33.8847313Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8847513Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8847837Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8847966Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T13:24:33.8848180Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8848345Z [rank1]:E1204 13:24:29.155000 473836 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T13:24:33.8848388Z dist init r=1, world=4 2025-12-04T13:24:33.8848722Z [rank0]:[W1204 13:24:29.805435684 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T13:24:33.8848765Z FAILED [9.1136s] [100%] 2025-12-04T13:24:33.8848767Z 2025-12-04T13:24:33.8848836Z =================================== FAILURES =================================== 2025-12-04T13:24:33.8848931Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________ 2025-12-04T13:24:33.8848979Z Traceback (most recent call last): 2025-12-04T13:24:33.8849157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T13:24:33.8849216Z self._join_processes(fn) 2025-12-04T13:24:33.8849393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T13:24:33.8849448Z self._check_return_codes(fn, elapsed_time) 2025-12-04T13:24:33.8849630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T13:24:33.8849677Z raise RuntimeError(error) 2025-12-04T13:24:33.8849798Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8849849Z Traceback (most recent call last): 2025-12-04T13:24:33.8850013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8850060Z getattr(self, test_name)() 2025-12-04T13:24:33.8850219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8850259Z fn() 2025-12-04T13:24:33.8850411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8850456Z method(*args, **kwargs) 2025-12-04T13:24:33.8850607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8850652Z method(*args, **kwargs) 2025-12-04T13:24:33.8850802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8850844Z with policy(): 2025-12-04T13:24:33.8850998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8851044Z raise RuntimeError(msg) 2025-12-04T13:24:33.8851365Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.8851368Z 2025-12-04T13:24:33.8851448Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8851647Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8851653Z 2025-12-04T13:24:33.8851741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8851744Z 2025-12-04T13:24:33.8851748Z 2025-12-04T13:24:33.8851843Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T13:24:33.8851933Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T13:24:33.8852172Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-154bf3dbaf892ee0.xml - 2025-12-04T13:24:33.8852235Z =========================== short test summary info ============================ 2025-12-04T13:24:33.8852459Z FAILED [9.1136s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T13:24:33.8852505Z Traceback (most recent call last): 2025-12-04T13:24:33.8852672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T13:24:33.8852739Z getattr(self, test_name)() 2025-12-04T13:24:33.8852905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T13:24:33.8852954Z fn() 2025-12-04T13:24:33.8853109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8853163Z method(*args, **kwargs) 2025-12-04T13:24:33.8853316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T13:24:33.8853356Z method(*args, **kwargs) 2025-12-04T13:24:33.8853509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T13:24:33.8853550Z with policy(): 2025-12-04T13:24:33.8853702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T13:24:33.8853747Z raise RuntimeError(msg) 2025-12-04T13:24:33.8854071Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432. 2025-12-04T13:24:33.8854074Z 2025-12-04T13:24:33.8854153Z To execute this test, run the following from the base repo dir: 2025-12-04T13:24:33.8854352Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8854354Z 2025-12-04T13:24:33.8854446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T13:24:33.8854510Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:24:33.8854577Z ======================= 1 failed, 20 deselected in 9.27s ======================= 2025-12-04T13:24:33.8854615Z Got exit code 1 2025-12-04T13:24:33.8854772Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T13:24:33.8854901Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:24:33.8855094Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8f5ceb018490cd5f.xml 2025-12-04T13:24:33.8855156Z ============================= test session starts ============================== 2025-12-04T13:24:33.8855269Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:24:33.8855315Z cachedir: .pytest_cache 2025-12-04T13:24:33.8855474Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:24:33.8855524Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:24:33.8855566Z configfile: pytest.ini 2025-12-04T13:24:33.8855745Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:24:33.8855822Z collecting ... collected 60 items / 21 deselected / 39 selected 2025-12-04T13:24:33.8855879Z stepcurrent: skipping 21 already run items. 2025-12-04T13:24:33.8855926Z Running 0 items in this shard 2025-12-04T13:24:33.8855928Z 2025-12-04T13:24:33.8856162Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8f5ceb018490cd5f.xml - 2025-12-04T13:24:33.8856221Z ============================ 21 deselected in 0.01s ============================ 2025-12-04T13:24:33.8859813Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda'] 2025-12-04T13:24:33.8859857Z 2025-12-04T13:24:33.8860047Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 3/3 (test/test-reports/distributed.fsdp.test_fsdp_core_3.3_fbe45a0587bc369b_.log) 2025-12-04T13:24:33.8860049Z 2025-12-04T13:24:33.8860174Z Finished distributed/fsdp/test_fsdp_core 3/3 ... [2025-12-04 13:24:33.630397][2239816.105424321], took 21.78min 2025-12-04T13:24:33.8860447Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:24:33.8860537Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:24:33.8860651Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T13:24:33.8860702Z Uploading artifacts took 0.00 seconds 2025-12-04T13:24:33.8860758Z distributed/fsdp/test_fsdp_core 3/3 failed! 2025-12-04T13:24:33.8860857Z Running distributed/test_c10d_ucc 1/1 ... [2025-12-04 13:24:33.633683][2239816.108712957] 2025-12-04T13:24:33.8860910Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:24:33.8861235Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_c10d_ucc.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:24:33.633843] 2025-12-04T13:24:34.5526279Z 2025-12-04T13:24:34.5526957Z distributed/test_c10d_ucc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_ucc_1.1_d65e4f9eb5646af7_.log 2025-12-04T13:24:34.5527719Z 2025-12-04T13:24:34.5527884Z Finished distributed/test_c10d_ucc 1/1 ... [2025-12-04 13:24:34.552341][2239817.027371263], took 0.02min 2025-12-04T13:24:34.5537551Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:24:34.5548182Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:24:34.5549652Z Running distributed/fsdp/test_fsdp_use_orig_params 1/1 ... [2025-12-04 13:24:34.554884][2239817.029913873] 2025-12-04T13:24:34.5549952Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:24:34.5551104Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/fsdp/test_fsdp_use_orig_params.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:24:34.555007] 2025-12-04T13:30:01.3746966Z 2025-12-04T13:30:01.3747359Z distributed/fsdp/test_fsdp_use_orig_params 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_use_orig_params_1.1_2c6e28ea164ab14c_.log 2025-12-04T13:30:01.3752992Z Running 25 items in this shard: test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_diff_hyperparams_cpu_offload_sharding_strategy_str_full_shard, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_diff_hyperparams_cpu_offload_sharding_strategy_str_no_shard, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_diff_hyperparams_cpu_offload_sharding_strategy_str_shard_grad_op, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_diff_hyperparams_sharding_strategy_str_full_shard, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_diff_hyperparams_sharding_strategy_str_no_shard, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_diff_hyperparams_sharding_strategy_str_shard_grad_op, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_diff_trainability, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_fsdp_compile, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsMultipleParamGroups::test_multiple_optimizers, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsUnshardReshard::test_multiple_forward_offload_params_False, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsUnshardReshard::test_multiple_forward_offload_params_True, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsUnshardReshard::test_summon_between_two_forwards_offload_params_False, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsUnshardReshard::test_summon_between_two_forwards_offload_params_True, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsParamAccess::test_access_params_after_forward, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsWriteback::test_grad_writeback, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsWriteback::test_no_reshard_and_mixed_precision, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsWriteback::test_param_writeback, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsWriteback::test_writeback_between_fwd_and_bwd_for_no_reshard_raises, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsWriteback::test_writeback_shape_mismatch, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsFQNs::test_named_parameters_in_forward, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsNoSync::test_no_sync_correctness, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsNoSync::test_no_sync_mixed_precision, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestFSDPUseOrigParamsInit::test_non_uniform_requires_grad, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestMultiTensorApply::test_multi_tensor_apply_size0_tensors_cpu, test/distributed/fsdp/test_fsdp_use_orig_params.py::TestMultiTensorApply::test_multi_tensor_apply_size0_tensors_cuda 2025-12-04T13:30:01.3757811Z 2025-12-04T13:30:01.3757958Z Finished distributed/fsdp/test_fsdp_use_orig_params 1/1 ... [2025-12-04 13:30:01.374609][2240143.849634417], took 5.45min 2025-12-04T13:30:01.3770493Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:30:01.3784156Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:30:01.3785824Z Running distributed/test_c10d_common 1/1 ... [2025-12-04 13:30:01.378507][2240143.853536838] 2025-12-04T13:30:01.3786025Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:30:01.3787690Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_c10d_common.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:30:01.378671] 2025-12-04T13:32:12.0809390Z 2025-12-04T13:32:12.0810195Z distributed/test_c10d_common 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_common_1.1_a8ed353e12f4e416_.log 2025-12-04T13:32:12.0816279Z Running 27 items in this shard: test/distributed/test_c10d_common.py::TimeoutTest::test_store_based_barrier, test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_multi_limit_multi_dtype, test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_multi_limit_single_dtype, test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_single_limit_multi_dtype, test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_single_limit_single_dtype, test/distributed/test_c10d_common.py::CommTest::test_debug_level, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_abort, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_backend_class_attr, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_backend_config, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_canonicalize_helper, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_collectives, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_get_backend_name, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_init_process_group_with_multiple_backends, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_is_backend_available, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_send_recv, test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_shutdown, test/distributed/test_c10d_common.py::ProcessGroupWithDispatchedCollectivesTests::test_default_process_group, test/distributed/test_c10d_common.py::ProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends, test/distributed/test_c10d_common.py::ProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend, test/distributed/test_c10d_common.py::ReduceOpTest::test_op_isinstance_of_reduceop, test/distributed/test_c10d_common.py::ReduceOpTest::test_reduceop_copyable, test/distributed/test_c10d_common.py::ReduceOpTest::test_reduceop_equal, test/distributed/test_c10d_common.py::ReduceOpTest::test_reduceop_pickle, test/distributed/test_c10d_common.py::LocalRankTest::testNodeLocalRank, test/distributed/test_c10d_common.py::LocalRankTest::testNodeLocalRankOverridesFallback, test/distributed/test_c10d_common.py::LocalRankTest::testWithoutEnv, test/distributed/test_c10d_common.py::LocalRankTest::testWithoutEnvWithFallback 2025-12-04T13:32:12.0821384Z Running 1 items in this shard: test/distributed/test_c10d_common.py::TimeoutTest::test_store_based_barrier 2025-12-04T13:32:12.0821874Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_multi_limit_multi_dtype 2025-12-04T13:32:12.0822361Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_multi_limit_single_dtype 2025-12-04T13:32:12.0822801Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_single_limit_multi_dtype 2025-12-04T13:32:12.0823266Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ComputeBucketAssignmentTest::test_single_limit_single_dtype 2025-12-04T13:32:12.0823657Z Running 1 items in this shard: test/distributed/test_c10d_common.py::CommTest::test_debug_level 2025-12-04T13:32:12.0824022Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_abort 2025-12-04T13:32:12.0824438Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_backend_class_attr 2025-12-04T13:32:12.0824867Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_backend_config 2025-12-04T13:32:12.0825300Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_canonicalize_helper 2025-12-04T13:32:12.0825726Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_collectives 2025-12-04T13:32:12.0826146Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_get_backend_name 2025-12-04T13:32:12.0826629Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_init_process_group_with_multiple_backends 2025-12-04T13:32:12.0827115Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_is_backend_available 2025-12-04T13:32:12.0827539Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_send_recv 2025-12-04T13:32:12.0827948Z Running 1 items in this shard: test/distributed/test_c10d_common.py::PythonProcessGroupExtensionTest::test_shutdown 2025-12-04T13:32:12.0828400Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ProcessGroupWithDispatchedCollectivesTests::test_default_process_group 2025-12-04T13:32:12.0828945Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends 2025-12-04T13:32:12.0829391Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend 2025-12-04T13:32:12.0829833Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ReduceOpTest::test_op_isinstance_of_reduceop 2025-12-04T13:32:12.0830136Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ReduceOpTest::test_reduceop_copyable 2025-12-04T13:32:12.0830421Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ReduceOpTest::test_reduceop_equal 2025-12-04T13:32:12.0830705Z Running 1 items in this shard: test/distributed/test_c10d_common.py::ReduceOpTest::test_reduceop_pickle 2025-12-04T13:32:12.0830992Z Running 1 items in this shard: test/distributed/test_c10d_common.py::LocalRankTest::testNodeLocalRank 2025-12-04T13:32:12.0831296Z Running 1 items in this shard: test/distributed/test_c10d_common.py::LocalRankTest::testNodeLocalRankOverridesFallback 2025-12-04T13:32:12.0831600Z Running 1 items in this shard: test/distributed/test_c10d_common.py::LocalRankTest::testWithoutEnv 2025-12-04T13:32:12.0831918Z Running 1 items in this shard: test/distributed/test_c10d_common.py::LocalRankTest::testWithoutEnvWithFallback 2025-12-04T13:32:12.0832091Z 2025-12-04T13:32:12.0832214Z Finished distributed/test_c10d_common 1/1 ... [2025-12-04 13:32:12.080943][2240274.555968827], took 2.18min 2025-12-04T13:32:12.0832671Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:32:12.0844690Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:32:12.0847253Z Running distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 ... [2025-12-04 13:32:12.084646][2240274.559676212] 2025-12-04T13:32:12.0847485Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:32:12.0849265Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/_shard/sharded_tensor/test_sharded_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:32:12.084815] 2025-12-04T13:36:30.1945064Z 2025-12-04T13:36:30.1945797Z distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_1.1_5d181f6d8c27b9f4_.log 2025-12-04T13:36:30.1960428Z Running 74 items in this shard: test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorMetadata::test_serialize_and_deserialize, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorFromParams::test_empty, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardParameter::test_shard_parameter, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardParameter::test_shard_parameter_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor_with_empty_shard, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestModuleHookApi::test_collect_local_shard, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestModuleHookApi::test_reshard_output, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestLocalTensor::test_local_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestLocalTensor::test_local_tensor_error, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_cleanup, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_complete_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_like, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_full, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_ones, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_rand, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_zeros, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_gather_even, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_gather_uneven, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_insufficient_sharding_dims, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_invalid_pg_rpc_ranks, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_invalid_sharding, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_load_state_dict_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_multiple_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_partial_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharded_tensor_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharded_tensor_sizes, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharding_columns, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict_no_sharded_tensors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_create_sharded_tensor_with_ones, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_gather_even, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_gather_uneven, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_grid_sharding, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_multiple_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_partial_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_device, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_cpu, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_cuda, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_test, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_uneven_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_with_rpc_names, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalTensor::test_init_from_local_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalTensor::test_init_from_local_tensor_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_invalid_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_with_all_zeros, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_with_local_view, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_pin_memory, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_property_cross_ranks, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_shards_gaps, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_shards_overlap, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_with_different_glb_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_non_rw_sharded_recalc_for_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_recalc_for_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_st_base_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op_override, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardMetadata::test_create_shard_with_no_placement, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardMetadata::test_shard_metadata_init, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorSubGroupInit::test_sub_process_group_placement_validation, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorSubGroupInit::test_sub_process_group_sharded_tensor_init, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorNoProcessGroupMode::test_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorNoProcessGroupMode::test_non_contiguous_local_shards 2025-12-04T13:36:30.1973333Z 2025-12-04T13:36:30.1973502Z Finished distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 ... [2025-12-04 13:36:30.194707][2240532.669734405], took 4.30min 2025-12-04T13:36:30.1973982Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:36:30.1978390Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:36:30.1981072Z Running distributed/test_c10d_nccl 3/3 ... [2025-12-04 13:36:30.198026][2240532.673056286] 2025-12-04T13:36:30.1981272Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:36:30.1982794Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'distributed/test_c10d_nccl.py', '--shard-id=3', '--num-shards=3', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:36:30.198192] 2025-12-04T13:45:05.9118461Z 2025-12-04T13:45:05.9119528Z distributed/test_c10d_nccl 3/3 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_nccl_3.3_5340d24d88fa3c55_.log 2025-12-04T13:45:05.9133842Z Running 72 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLInitTest::test_init_wo_backend_str, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_mixed_empty_pgs, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_eager_init_subgroup, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group_mixed_backend, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context_sync_ops, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_get_uid, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float32, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nccl_dist_backend_error, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_with_eager_init, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_restart_pg, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_set_nccl_pg_timeout_backend_nccl, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_flags, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_nccl_config, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_performance, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_vs_abort_reinit_performance, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_subgroup_p2p_eager_init_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_arbitrary_forward_return_value_grad_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_builtin_ddp_comm_hooks_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_channels_last_contig, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output_unused_param, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_hook_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params_and_grads, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_packed_sequence, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_failure_recovery, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_info, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_multiple_outputs_multiple_backward_grad_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_multi_device_ids_not_allowed, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_blocking_wait_with_barrier, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_errors_blocking, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_send_recv_non_dense_tensor, test/distributed/test_c10d_nccl.py::NcclUserBufferRegistrationTest::test_nccl_user_buffer_registration, test/distributed/test_c10d_nccl.py::CommTest::test_intra_node_comm_all_reduce, test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_detail, test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_info, test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_default_pg_nccl, test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_complex, test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_mismatch, test/distributed/test_c10d_nccl.py::CommTest::test_time_estimate_nccl, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e4m3fn, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends, test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_object_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_duplicated_pg, test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_sanity_check, test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_barrier_profiling, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_batched_send_recv_op_sizes_per_coalesce0_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_False_include_collectives_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_True_include_collectives_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_True 2025-12-04T13:45:05.9144185Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLInitTest::test_init_wo_backend_str 2025-12-04T13:45:05.9144541Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_mixed_empty_pgs 2025-12-04T13:45:05.9144903Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_eager_init_subgroup 2025-12-04T13:45:05.9145286Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group_mixed_backend 2025-12-04T13:45:05.9145633Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context 2025-12-04T13:45:05.9145968Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context_sync_ops 2025-12-04T13:45:05.9146285Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_get_uid 2025-12-04T13:45:05.9146591Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float32 2025-12-04T13:45:05.9146917Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nccl_dist_backend_error 2025-12-04T13:45:05.9147277Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_with_eager_init 2025-12-04T13:45:05.9147606Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_restart_pg 2025-12-04T13:45:05.9147956Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_set_nccl_pg_timeout_backend_nccl 2025-12-04T13:45:05.9148312Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_flags 2025-12-04T13:45:05.9148641Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_nccl_config 2025-12-04T13:45:05.9148979Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_performance 2025-12-04T13:45:05.9149341Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_vs_abort_reinit_performance 2025-12-04T13:45:05.9149749Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_subgroup_p2p_eager_init_False 2025-12-04T13:45:05.9150124Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_arbitrary_forward_return_value_grad_is_view 2025-12-04T13:45:05.9150504Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_builtin_ddp_comm_hooks_nccl 2025-12-04T13:45:05.9150847Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_channels_last_contig 2025-12-04T13:45:05.9151193Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output_unused_param 2025-12-04T13:45:05.9151573Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True 2025-12-04T13:45:05.9152052Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False 2025-12-04T13:45:05.9152490Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True 2025-12-04T13:45:05.9152911Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True 2025-12-04T13:45:05.9153310Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing 2025-12-04T13:45:05.9153711Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True 2025-12-04T13:45:05.9154130Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True 2025-12-04T13:45:05.9154546Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_hook_nccl 2025-12-04T13:45:05.9154913Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params_and_grads 2025-12-04T13:45:05.9155257Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_packed_sequence 2025-12-04T13:45:05.9155681Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_failure_recovery 2025-12-04T13:45:05.9156048Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_info 2025-12-04T13:45:05.9156438Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_is_view 2025-12-04T13:45:05.9156839Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_multiple_outputs_multiple_backward_grad_is_view 2025-12-04T13:45:05.9157246Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_multi_device_ids_not_allowed 2025-12-04T13:45:05.9157638Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input 2025-12-04T13:45:05.9158037Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input 2025-12-04T13:45:05.9158395Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_blocking_wait_with_barrier 2025-12-04T13:45:05.9158727Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_errors_blocking 2025-12-04T13:45:05.9159050Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_send_recv_non_dense_tensor 2025-12-04T13:45:05.9159401Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclUserBufferRegistrationTest::test_nccl_user_buffer_registration 2025-12-04T13:45:05.9159777Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_intra_node_comm_all_reduce 2025-12-04T13:45:05.9160077Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_detail 2025-12-04T13:45:05.9160385Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_info 2025-12-04T13:45:05.9160695Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_default_pg_nccl 2025-12-04T13:45:05.9160986Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_complex 2025-12-04T13:45:05.9161263Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_mismatch 2025-12-04T13:45:05.9161541Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_time_estimate_nccl 2025-12-04T13:45:05.9161884Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single 2025-12-04T13:45:05.9162308Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e4m3fn 2025-12-04T13:45:05.9162735Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced 2025-12-04T13:45:05.9163167Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends 2025-12-04T13:45:05.9163591Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_False 2025-12-04T13:45:05.9163976Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_object_subgroup_group_rank_True 2025-12-04T13:45:05.9164306Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_duplicated_pg 2025-12-04T13:45:05.9164629Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_sanity_check 2025-12-04T13:45:05.9164950Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_True 2025-12-04T13:45:05.9165269Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_False 2025-12-04T13:45:05.9165609Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_True 2025-12-04T13:45:05.9165976Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_True 2025-12-04T13:45:05.9166285Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_barrier_profiling 2025-12-04T13:45:05.9166641Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_batched_send_recv_op_sizes_per_coalesce0_timing_enabled_True 2025-12-04T13:45:05.9167037Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_False 2025-12-04T13:45:05.9167402Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_True 2025-12-04T13:45:05.9167748Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_timing_enabled_True 2025-12-04T13:45:05.9168094Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_False 2025-12-04T13:45:05.9168478Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_False_include_collectives_False 2025-12-04T13:45:05.9168862Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_True_include_collectives_True 2025-12-04T13:45:05.9169243Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_True 2025-12-04T13:45:05.9169604Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_False 2025-12-04T13:45:05.9169978Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_True 2025-12-04T13:45:05.9170167Z 2025-12-04T13:45:05.9170285Z Finished distributed/test_c10d_nccl 3/3 ... [2025-12-04 13:45:05.912426][2241048.38745096], took 8.60min 2025-12-04T13:45:05.9170712Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-d080382cfbad5558.xml 2025-12-04T13:45:05.9171117Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:45:05.9171344Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T13:45:05.9171526Z Uploading artifacts took 0.00 seconds 2025-12-04T13:45:07.9880308Z Running test batch 'tests to run' cost 9066.4 seconds 2025-12-04T13:45:07.9882798Z Emitting td_test_failure_stats_v2 2025-12-04T13:45:07.9886459Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764855907_7242490ed11711f092553edb3c5cc5f9 2025-12-04T13:45:10.0060779Z /var/lib/jenkins/pytorch/tools/stats/upload_metrics.py:156: UserWarning: Error uploading metric td_test_failure_stats_v2 to DynamoDB: Unable to locate credentials 2025-12-04T13:45:10.0061136Z warn(f"Error uploading metric {metric_name} to DynamoDB: {e}") 2025-12-04T13:45:10.0061787Z Emitting td_test_failure_stats_v2 2025-12-04T13:45:10.0066028Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764855910_737628f4d11711f092553edb3c5cc5f9 2025-12-04T13:45:10.0080240Z Emitting td_test_failure_stats_v2 2025-12-04T13:45:10.0080744Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764855910_73766b16d11711f092553edb3c5cc5f9 2025-12-04T13:45:10.0096257Z Emitting td_test_failure_stats_v2 2025-12-04T13:45:10.0096726Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764855910_7376ac34d11711f092553edb3c5cc5f9 2025-12-04T13:45:10.0113326Z Emitting td_test_failure_stats_v2 2025-12-04T13:45:10.0113753Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764855910_7376ed2ad11711f092553edb3c5cc5f9 2025-12-04T13:45:10.0128041Z distributed/fsdp/test_fsdp_overlap 1/1 failed! 2025-12-04T13:45:10.0128416Z distributed/fsdp/test_fsdp_pure_fp16 1/1 failed! 2025-12-04T13:45:10.0128668Z distributed/fsdp/test_fsdp_apply 1/1 failed! 2025-12-04T13:45:10.0129019Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed! 2025-12-04T13:45:10.0129277Z distributed/fsdp/test_fsdp_core 3/3 failed! 2025-12-04T13:45:10.5894125Z 2025-12-04T13:45:10.5894442Z real 151m11.803s 2025-12-04T13:45:10.5894672Z user 895m25.436s 2025-12-04T13:45:10.5894852Z sys 383m59.826s 2025-12-04T13:45:10.5895040Z + sccache_epilogue 2025-12-04T13:45:10.5895288Z + echo '::group::Sccache Compilation Log' 2025-12-04T13:45:10.5895821Z ##[group]Sccache Compilation Log 2025-12-04T13:45:10.5896113Z + echo '=================== sccache compilation log ===================' 2025-12-04T13:45:10.5896442Z =================== sccache compilation log =================== 2025-12-04T13:45:10.5896908Z + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T13:45:10.5971964Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T13:45:10.5972425Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T13:45:10.5972747Z + sccache --show-stats 2025-12-04T13:45:10.5995530Z Compile requests 848 2025-12-04T13:45:10.5995772Z Compile requests executed 12 2025-12-04T13:45:10.5995983Z Cache hits 0 2025-12-04T13:45:10.5996194Z Cache misses 12 2025-12-04T13:45:10.5996383Z Cache misses (C/C++) 12 2025-12-04T13:45:10.5996582Z Cache hits rate 0.00 % 2025-12-04T13:45:10.5996786Z Cache hits rate (C/C++) 0.00 % 2025-12-04T13:45:10.5996984Z Cache timeouts 0 2025-12-04T13:45:10.5997180Z Cache read errors 0 2025-12-04T13:45:10.5997372Z Forced recaches 0 2025-12-04T13:45:10.5997560Z Cache write errors 0 2025-12-04T13:45:10.5997751Z Cache errors 0 2025-12-04T13:45:10.5997946Z Compilations 12 2025-12-04T13:45:10.5998142Z Compilation failures 0 2025-12-04T13:45:10.5998342Z Non-cacheable compilations 0 2025-12-04T13:45:10.5998537Z Non-cacheable calls 13 2025-12-04T13:45:10.5998735Z Non-compilation calls 823 2025-12-04T13:45:10.5998934Z Unsupported compiler calls 0 2025-12-04T13:45:10.5999139Z Average cache write 0.000 s 2025-12-04T13:45:10.5999345Z Average compiler 0.973 s 2025-12-04T13:45:10.5999549Z Average cache read hit 0.000 s 2025-12-04T13:45:10.5999891Z Failed distributed compilations 0 2025-12-04T13:45:10.6000027Z 2025-12-04T13:45:10.6000098Z Non-cacheable reasons: 2025-12-04T13:45:10.6000272Z -E 7 2025-12-04T13:45:10.6000469Z unknown source language 6 2025-12-04T13:45:10.6000597Z 2025-12-04T13:45:10.6000729Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-12-04T13:45:10.6001081Z Use direct/preprocessor mode? yes 2025-12-04T13:45:10.6001292Z Version (client) 0.10.0 2025-12-04T13:45:10.6001498Z Cache size 711 KiB 2025-12-04T13:45:10.6001698Z Max cache size 10 GiB 2025-12-04T13:45:10.6001898Z + sccache --stop-server 2025-12-04T13:45:10.6011765Z Stopping sccache server... 2025-12-04T13:45:10.6015420Z Compile requests 848 2025-12-04T13:45:10.6015843Z Compile requests executed 12 2025-12-04T13:45:10.6016137Z Cache hits 0 2025-12-04T13:45:10.6016909Z Cache misses 12 2025-12-04T13:45:10.6026190Z Cache misses (C/C++) 12 2025-12-04T13:45:10.6026485Z Cache hits rate 0.00 % 2025-12-04T13:45:10.6026751Z Cache hits rate (C/C++) 0.00 % 2025-12-04T13:45:10.6027193Z Cache timeouts 0 2025-12-04T13:45:10.6027412Z Cache read errors 0 2025-12-04T13:45:10.6027632Z Forced recaches 0 2025-12-04T13:45:10.6027892Z Cache write errors 0 2025-12-04T13:45:10.6028117Z Cache errors 0 2025-12-04T13:45:10.6028324Z Compilations 12 2025-12-04T13:45:10.6028572Z Compilation failures 0 2025-12-04T13:45:10.6028783Z Non-cacheable compilations 0 2025-12-04T13:45:10.6029001Z Non-cacheable calls 13 2025-12-04T13:45:10.6029224Z Non-compilation calls 823 2025-12-04T13:45:10.6029439Z Unsupported compiler calls 0 2025-12-04T13:45:10.6029653Z Average cache write 0.000 s 2025-12-04T13:45:10.6029989Z Average compiler 0.973 s 2025-12-04T13:45:10.6030208Z Average cache read hit 0.000 s 2025-12-04T13:45:10.6030426Z Failed distributed compilations 0 2025-12-04T13:45:10.6030579Z 2025-12-04T13:45:10.6030666Z Non-cacheable reasons: 2025-12-04T13:45:10.6030867Z -E 7 2025-12-04T13:45:10.6031080Z unknown source language 6 2025-12-04T13:45:10.6031227Z 2025-12-04T13:45:10.6031359Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-12-04T13:45:10.6031646Z Use direct/preprocessor mode? yes 2025-12-04T13:45:10.6031852Z Version (client) 0.10.0 2025-12-04T13:45:10.6032091Z Cache size 711 KiB 2025-12-04T13:45:10.6032281Z Max cache size 10 GiB 2025-12-04T13:45:10.6032459Z + echo ::endgroup:: 2025-12-04T13:45:10.6032786Z ##[endgroup] 2025-12-04T13:45:10.6075909Z ##[error]Process completed with exit code 1. 2025-12-04T13:45:10.6109415Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-12-04T13:45:10.6110136Z # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-12-04T13:45:10.6110692Z docker exec -t "80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" 2025-12-04T13:45:10.6116584Z shell: /usr/bin/bash -e {0} 2025-12-04T13:45:10.6116741Z env: 2025-12-04T13:45:10.6116881Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:10.6117088Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:10.6117320Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:10.6117547Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:10.6118226Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:10.6118859Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:10.6119028Z AWS_REGION: us-east-1 2025-12-04T13:45:10.6119329Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:10.6119531Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:10.6122276Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:10.6122529Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:10.6122793Z ##[endgroup] 2025-12-04T13:45:10.6886676Z ##[group]Run docker exec -t "80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67" sh -c "sudo chown -R 1001:1001 test" 2025-12-04T13:45:10.6887134Z docker exec -t "80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67" sh -c "sudo chown -R 1001:1001 test" 2025-12-04T13:45:10.6891813Z shell: /usr/bin/bash -e {0} 2025-12-04T13:45:10.6891926Z env: 2025-12-04T13:45:10.6892021Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:10.6892157Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:10.6892339Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:10.6892505Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:10.6893023Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:10.6893602Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:10.6893720Z AWS_REGION: us-east-1 2025-12-04T13:45:10.6893878Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:10.6894029Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:10.6896137Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:10.6896305Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:10.6896483Z ##[endgroup] 2025-12-04T13:45:10.7675667Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T13:45:10.7675942Z cat test/**/*_toprint.log || true 2025-12-04T13:45:10.7681586Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T13:45:10.7681772Z env: 2025-12-04T13:45:10.7681902Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:10.7682080Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:10.7682317Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:10.7682537Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:10.7683189Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:10.7683812Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:10.7683995Z AWS_REGION: us-east-1 2025-12-04T13:45:10.7684210Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:10.7684406Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:10.7687098Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:10.7687321Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:10.7687550Z ##[endgroup] 2025-12-04T13:45:10.7731316Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T13:45:10.7797024Z Prepare all required actions 2025-12-04T13:45:10.7797390Z Getting action download info 2025-12-04T13:45:11.1518831Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T13:45:11.9696514Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T13:45:12.9414383Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T13:45:12.9414537Z with: 2025-12-04T13:45:12.9414632Z use-gha: true 2025-12-04T13:45:12.9414795Z file-suffix: test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552 2025-12-04T13:45:12.9414977Z s3-bucket: gha-artifacts 2025-12-04T13:45:12.9415088Z env: 2025-12-04T13:45:12.9415182Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:12.9415320Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:12.9415510Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:12.9415692Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:12.9416198Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:12.9416688Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:12.9416803Z AWS_REGION: us-east-1 2025-12-04T13:45:12.9416961Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:12.9417111Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:12.9419239Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:12.9419405Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:12.9419582Z ##[endgroup] 2025-12-04T13:45:12.9448642Z ##[group]Run actions/upload-artifact@v4 2025-12-04T13:45:12.9448844Z with: 2025-12-04T13:45:12.9449044Z name: test-jsons-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip 2025-12-04T13:45:12.9449260Z retention-days: 14 2025-12-04T13:45:12.9449424Z if-no-files-found: warn 2025-12-04T13:45:12.9449532Z path: test/**/*.json 2025-12-04T13:45:12.9449634Z compression-level: 6 2025-12-04T13:45:12.9449789Z overwrite: false 2025-12-04T13:45:12.9449893Z include-hidden-files: false 2025-12-04T13:45:12.9450004Z env: 2025-12-04T13:45:12.9450095Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:12.9450232Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:12.9450408Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:12.9450575Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:12.9451077Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:12.9451562Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:12.9451681Z AWS_REGION: us-east-1 2025-12-04T13:45:12.9451810Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:12.9451960Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:12.9454073Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:12.9454240Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:12.9454416Z ##[endgroup] 2025-12-04T13:45:13.3478902Z With the provided path, there will be 6 files uploaded 2025-12-04T13:45:13.3481934Z Artifact name is valid! 2025-12-04T13:45:13.3482561Z Root directory input is valid! 2025-12-04T13:45:13.5947869Z Beginning upload of artifact content to blob storage 2025-12-04T13:45:13.9588144Z Uploaded bytes 44615 2025-12-04T13:45:14.0249869Z Finished uploading artifact content to blob storage! 2025-12-04T13:45:14.0250537Z SHA256 digest of uploaded artifact zip is 69e37ca5c71c27efad288fc7d36181212105d3286be51e73032373743a7ffda8 2025-12-04T13:45:14.0251837Z Finalizing artifact upload 2025-12-04T13:45:14.1832675Z Artifact test-jsons-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip.zip successfully finalized. Artifact ID 4764668012 2025-12-04T13:45:14.1834136Z Artifact test-jsons-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip has been successfully uploaded! Final size is 44615 bytes. Artifact ID is 4764668012 2025-12-04T13:45:14.1836929Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922798714/artifacts/4764668012 2025-12-04T13:45:14.1961744Z ##[group]Run actions/upload-artifact@v4 2025-12-04T13:45:14.1961912Z with: 2025-12-04T13:45:14.1962165Z name: test-reports-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip 2025-12-04T13:45:14.1962444Z retention-days: 14 2025-12-04T13:45:14.1962619Z if-no-files-found: ignore 2025-12-04T13:45:14.1962774Z path: test/**/*.xml test/**/*.csv 2025-12-04T13:45:14.1962940Z compression-level: 6 2025-12-04T13:45:14.1963082Z overwrite: false 2025-12-04T13:45:14.1963223Z include-hidden-files: false 2025-12-04T13:45:14.1963372Z env: 2025-12-04T13:45:14.1963499Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:14.1963678Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:14.1963905Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:14.1964118Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:14.1964949Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:14.1965488Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:14.1965657Z AWS_REGION: us-east-1 2025-12-04T13:45:14.1966003Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:14.1966213Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:14.1968415Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:14.1968635Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:14.1968954Z ##[endgroup] 2025-12-04T13:45:14.6280421Z With the provided path, there will be 727 files uploaded 2025-12-04T13:45:14.6283154Z Artifact name is valid! 2025-12-04T13:45:14.6283797Z Root directory input is valid! 2025-12-04T13:45:14.8544595Z Beginning upload of artifact content to blob storage 2025-12-04T13:45:15.6052068Z Uploaded bytes 609681 2025-12-04T13:45:15.6786553Z Finished uploading artifact content to blob storage! 2025-12-04T13:45:15.6787723Z SHA256 digest of uploaded artifact zip is 7f3257da1f02c7e336f4ebca130ebdbc243f1ac441016ff1d037a7bf54fc7481 2025-12-04T13:45:15.6788397Z Finalizing artifact upload 2025-12-04T13:45:15.8545334Z Artifact test-reports-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip.zip successfully finalized. Artifact ID 4764668340 2025-12-04T13:45:15.8546614Z Artifact test-reports-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip has been successfully uploaded! Final size is 609681 bytes. Artifact ID is 4764668340 2025-12-04T13:45:15.8551443Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922798714/artifacts/4764668340 2025-12-04T13:45:15.8686971Z ##[group]Run actions/upload-artifact@v4 2025-12-04T13:45:15.8687134Z with: 2025-12-04T13:45:15.8687342Z name: logs-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip 2025-12-04T13:45:15.8687576Z retention-days: 14 2025-12-04T13:45:15.8687702Z if-no-files-found: ignore 2025-12-04T13:45:15.8687842Z path: usage_log.txt test/**/*.log 2025-12-04T13:45:15.8687989Z compression-level: 6 2025-12-04T13:45:15.8688117Z overwrite: false 2025-12-04T13:45:15.8688247Z include-hidden-files: false 2025-12-04T13:45:15.8688385Z env: 2025-12-04T13:45:15.8688492Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:15.8688662Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:15.8689026Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:15.8689221Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:15.8689974Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:15.8690511Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:15.8690647Z AWS_REGION: us-east-1 2025-12-04T13:45:15.8690846Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:15.8691021Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:15.8693219Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:15.8693400Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:15.8693591Z ##[endgroup] 2025-12-04T13:45:16.2713358Z Multiple search paths detected. Calculating the least common ancestor of all paths 2025-12-04T13:45:16.2714206Z The least common ancestor is /home/runner/_work/pytorch/pytorch. This will be the root directory of the artifact 2025-12-04T13:45:16.2714700Z With the provided path, there will be 95 files uploaded 2025-12-04T13:45:16.2716965Z Artifact name is valid! 2025-12-04T13:45:16.2717381Z Root directory input is valid! 2025-12-04T13:45:16.4921361Z Beginning upload of artifact content to blob storage 2025-12-04T13:45:17.1450062Z Uploaded bytes 754844 2025-12-04T13:45:17.2131215Z Finished uploading artifact content to blob storage! 2025-12-04T13:45:17.2132763Z SHA256 digest of uploaded artifact zip is fac56bf62ef4b6b65d3b468e7cc24c56937596b6bdda14cd5f8f5c71aef15f70 2025-12-04T13:45:17.2133463Z Finalizing artifact upload 2025-12-04T13:45:17.3560699Z Artifact logs-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip.zip successfully finalized. Artifact ID 4764668679 2025-12-04T13:45:17.3562380Z Artifact logs-runattempt1-test-distributed-1-3-linux.rocm.gpu.gfx942.4.b_57117547552.zip has been successfully uploaded! Final size is 754844 bytes. Artifact ID is 4764668679 2025-12-04T13:45:17.3564888Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922798714/artifacts/4764668679 2025-12-04T13:45:17.3693548Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T13:45:17.3693743Z # shellcheck disable=SC2156 2025-12-04T13:45:17.3693985Z find . -iname "core.[1-9]*" -exec docker exec "${CONTAINER_NAME}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T13:45:17.3698505Z shell: /usr/bin/bash -e {0} 2025-12-04T13:45:17.3698636Z env: 2025-12-04T13:45:17.3698751Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:17.3698906Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:17.3699105Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:17.3699285Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:17.3699856Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:17.3700359Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:17.3700488Z AWS_REGION: us-east-1 2025-12-04T13:45:17.3700680Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:17.3700846Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:17.3702993Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:17.3703175Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:17.3703369Z ##[endgroup] 2025-12-04T13:45:17.5052060Z ##[group]Run actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 2025-12-04T13:45:17.5052265Z with: 2025-12-04T13:45:17.5052415Z name: coredumps-distributed-1-3-linux.rocm.gpu.gfx942.4.b 2025-12-04T13:45:17.5052595Z retention-days: 14 2025-12-04T13:45:17.5052713Z if-no-files-found: ignore 2025-12-04T13:45:17.5052840Z path: ./**/core.[1-9]* 2025-12-04T13:45:17.5052965Z compression-level: 6 2025-12-04T13:45:17.5053079Z overwrite: false 2025-12-04T13:45:17.5053192Z include-hidden-files: false 2025-12-04T13:45:17.5053314Z env: 2025-12-04T13:45:17.5053416Z GIT_DEFAULT_BRANCH: main 2025-12-04T13:45:17.5053564Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T13:45:17.5053754Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T13:45:17.5053933Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T13:45:17.5054469Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T13:45:17.5054982Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T13:45:17.5055111Z AWS_REGION: us-east-1 2025-12-04T13:45:17.5055288Z AWS_ACCESS_KEY_ID: *** 2025-12-04T13:45:17.5055462Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T13:45:17.5057626Z AWS_SESSION_TOKEN: *** 2025-12-04T13:45:17.5057808Z CONTAINER_NAME: 80ad030d602552c22ffaa1b21b3763432836139b66e1b11123254bc733ccaf67 2025-12-04T13:45:17.5058000Z ##[endgroup] 2025-12-04T13:45:21.3877168Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T13:45:21.4056639Z Post job cleanup. 2025-12-04T13:45:21.4069181Z Post job cleanup. 2025-12-04T13:45:21.4270295Z Logging out of registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T13:45:21.4482170Z Post job cleanup. 2025-12-04T13:45:21.5115066Z Post job cleanup. 2025-12-04T13:45:21.5146175Z Post job cleanup. 2025-12-04T13:45:21.5625729Z [command]/usr/bin/git version 2025-12-04T13:45:21.5654962Z git version 2.52.0 2025-12-04T13:45:21.5677787Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/dc87cd0d-6cca-4e9f-8380-a33771d6cf1d/.gitconfig' 2025-12-04T13:45:21.5683678Z Temporarily overriding HOME='/home/runner/_work/_temp/dc87cd0d-6cca-4e9f-8380-a33771d6cf1d' before making global git config changes 2025-12-04T13:45:21.5684284Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T13:45:21.5686520Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T13:45:21.5710840Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T13:45:21.5737018Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T13:45:21.5925166Z Entering 'android/libs/fbjni' 2025-12-04T13:45:21.5952654Z Entering 'third_party/FP16' 2025-12-04T13:45:21.5979878Z Entering 'third_party/FXdiv' 2025-12-04T13:45:21.6005128Z Entering 'third_party/NNPACK' 2025-12-04T13:45:21.6026390Z Entering 'third_party/NVTX' 2025-12-04T13:45:21.6051797Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T13:45:21.6078102Z Entering 'third_party/XNNPACK' 2025-12-04T13:45:21.6108306Z Entering 'third_party/aiter' 2025-12-04T13:45:21.6136031Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T13:45:21.6161198Z Entering 'third_party/benchmark' 2025-12-04T13:45:21.6185275Z Entering 'third_party/composable_kernel' 2025-12-04T13:45:21.6215105Z Entering 'third_party/cpp-httplib' 2025-12-04T13:45:21.6238999Z Entering 'third_party/cpuinfo' 2025-12-04T13:45:21.6265272Z Entering 'third_party/cudnn_frontend' 2025-12-04T13:45:21.6289642Z Entering 'third_party/cutlass' 2025-12-04T13:45:21.6318573Z Entering 'third_party/fbgemm' 2025-12-04T13:45:21.6345418Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T13:45:21.6365730Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T13:45:21.6390931Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T13:45:21.6413126Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T13:45:21.6442937Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T13:45:21.6464546Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T13:45:21.6486885Z Entering 'third_party/fbgemm/external/json' 2025-12-04T13:45:21.6511199Z Entering 'third_party/flash-attention' 2025-12-04T13:45:21.6536732Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T13:45:21.6576955Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T13:45:21.6604774Z Entering 'third_party/flatbuffers' 2025-12-04T13:45:21.6629501Z Entering 'third_party/fmt' 2025-12-04T13:45:21.6657918Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T13:45:21.6681623Z Entering 'third_party/gloo' 2025-12-04T13:45:21.6705695Z Entering 'third_party/googletest' 2025-12-04T13:45:21.6730873Z Entering 'third_party/ideep' 2025-12-04T13:45:21.6754969Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T13:45:21.6784014Z Entering 'third_party/ittapi' 2025-12-04T13:45:21.6806824Z Entering 'third_party/kineto' 2025-12-04T13:45:21.6830667Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T13:45:21.6851431Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T13:45:21.6873152Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T13:45:21.6897675Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T13:45:21.6920296Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T13:45:21.6942920Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T13:45:21.6971280Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T13:45:21.6993712Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T13:45:21.7019164Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T13:45:21.7042604Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T13:45:21.7063236Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T13:45:21.7085471Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:21.7108343Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:21.7134957Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T13:45:21.7162966Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T13:45:21.7188179Z Entering 'third_party/kleidiai' 2025-12-04T13:45:21.7213490Z Entering 'third_party/mimalloc' 2025-12-04T13:45:21.7237497Z Entering 'third_party/nlohmann' 2025-12-04T13:45:21.7261704Z Entering 'third_party/onnx' 2025-12-04T13:45:21.7291566Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T13:45:21.7317037Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T13:45:21.7342112Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T13:45:21.7370982Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T13:45:21.7393602Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T13:45:21.7415861Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T13:45:21.7443759Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T13:45:21.7463018Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T13:45:21.7482465Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T13:45:21.7502678Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:21.7534211Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:21.7562433Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T13:45:21.7594922Z Entering 'third_party/pocketfft' 2025-12-04T13:45:21.7623220Z Entering 'third_party/protobuf' 2025-12-04T13:45:21.7647906Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T13:45:21.7676844Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T13:45:21.7705625Z Entering 'third_party/psimd' 2025-12-04T13:45:21.7728644Z Entering 'third_party/pthreadpool' 2025-12-04T13:45:21.7751510Z Entering 'third_party/pybind11' 2025-12-04T13:45:21.7775459Z Entering 'third_party/python-peachpy' 2025-12-04T13:45:21.7798399Z Entering 'third_party/sleef' 2025-12-04T13:45:21.7822910Z Entering 'third_party/tensorpipe' 2025-12-04T13:45:21.7849050Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T13:45:21.7883660Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T13:45:21.7907860Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T13:45:21.7934150Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T13:45:21.7953559Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T13:45:21.7999290Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T13:45:21.8015726Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8026618Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T13:45:21.8046205Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T13:45:21.8229014Z Entering 'android/libs/fbjni' 2025-12-04T13:45:21.8244519Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8265181Z Entering 'third_party/FP16' 2025-12-04T13:45:21.8280557Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8299948Z Entering 'third_party/FXdiv' 2025-12-04T13:45:21.8316006Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8336689Z Entering 'third_party/NNPACK' 2025-12-04T13:45:21.8351099Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8369428Z Entering 'third_party/NVTX' 2025-12-04T13:45:21.8388864Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8409744Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T13:45:21.8425099Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8442946Z Entering 'third_party/XNNPACK' 2025-12-04T13:45:21.8459735Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8488510Z Entering 'third_party/aiter' 2025-12-04T13:45:21.8504265Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8526710Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T13:45:21.8538755Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8565293Z Entering 'third_party/benchmark' 2025-12-04T13:45:21.8581695Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8604094Z Entering 'third_party/composable_kernel' 2025-12-04T13:45:21.8619658Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8642690Z Entering 'third_party/cpp-httplib' 2025-12-04T13:45:21.8656575Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8677945Z Entering 'third_party/cpuinfo' 2025-12-04T13:45:21.8694370Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8709591Z Entering 'third_party/cudnn_frontend' 2025-12-04T13:45:21.8723463Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8741887Z Entering 'third_party/cutlass' 2025-12-04T13:45:21.8757007Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8788393Z Entering 'third_party/fbgemm' 2025-12-04T13:45:21.8805594Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8824331Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T13:45:21.8839251Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8857010Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T13:45:21.8869076Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8893599Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T13:45:21.8907540Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8922617Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T13:45:21.8934893Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8954085Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T13:45:21.8966305Z http.https://github.com/.extraheader 2025-12-04T13:45:21.8981482Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T13:45:21.8993806Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9009750Z Entering 'third_party/fbgemm/external/json' 2025-12-04T13:45:21.9021554Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9043618Z Entering 'third_party/flash-attention' 2025-12-04T13:45:21.9058640Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9077183Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T13:45:21.9089985Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9108286Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T13:45:21.9124794Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9149029Z Entering 'third_party/flatbuffers' 2025-12-04T13:45:21.9166815Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9186067Z Entering 'third_party/fmt' 2025-12-04T13:45:21.9208626Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9230974Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T13:45:21.9247346Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9265631Z Entering 'third_party/gloo' 2025-12-04T13:45:21.9280667Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9300320Z Entering 'third_party/googletest' 2025-12-04T13:45:21.9317692Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9337351Z Entering 'third_party/ideep' 2025-12-04T13:45:21.9352867Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9377354Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T13:45:21.9395643Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9425511Z Entering 'third_party/ittapi' 2025-12-04T13:45:21.9443341Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9462633Z Entering 'third_party/kineto' 2025-12-04T13:45:21.9478620Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9497077Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T13:45:21.9515099Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9538336Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T13:45:21.9554673Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9573850Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T13:45:21.9590073Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9608192Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T13:45:21.9624046Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9646764Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T13:45:21.9663409Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9682446Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T13:45:21.9699172Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9719038Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T13:45:21.9735718Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9749413Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T13:45:21.9763767Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9779132Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T13:45:21.9795103Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9816404Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T13:45:21.9829651Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9857269Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T13:45:21.9875529Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9892514Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:21.9909307Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9936974Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:21.9957709Z http.https://github.com/.extraheader 2025-12-04T13:45:21.9980858Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T13:45:21.9997000Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0017355Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T13:45:22.0032568Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0053863Z Entering 'third_party/kleidiai' 2025-12-04T13:45:22.0069207Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0088858Z Entering 'third_party/mimalloc' 2025-12-04T13:45:22.0108572Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0135185Z Entering 'third_party/nlohmann' 2025-12-04T13:45:22.0159925Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0178939Z Entering 'third_party/onnx' 2025-12-04T13:45:22.0197515Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0222932Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T13:45:22.0245967Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0268648Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T13:45:22.0284362Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0300909Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T13:45:22.0313607Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0333685Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T13:45:22.0353396Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0375060Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T13:45:22.0393709Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0412026Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T13:45:22.0433195Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0456769Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T13:45:22.0475217Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0498660Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T13:45:22.0514908Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0532916Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T13:45:22.0554428Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0578374Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:22.0595102Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0614138Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:22.0632606Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0653729Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T13:45:22.0670003Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0699953Z Entering 'third_party/pocketfft' 2025-12-04T13:45:22.0714767Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0734477Z Entering 'third_party/protobuf' 2025-12-04T13:45:22.0750847Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0771002Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T13:45:22.0787723Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0804450Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T13:45:22.0825228Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0851835Z Entering 'third_party/psimd' 2025-12-04T13:45:22.0866567Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0883181Z Entering 'third_party/pthreadpool' 2025-12-04T13:45:22.0897973Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0915221Z Entering 'third_party/pybind11' 2025-12-04T13:45:22.0931086Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0956434Z Entering 'third_party/python-peachpy' 2025-12-04T13:45:22.0972135Z http.https://github.com/.extraheader 2025-12-04T13:45:22.0989095Z Entering 'third_party/sleef' 2025-12-04T13:45:22.1002620Z http.https://github.com/.extraheader 2025-12-04T13:45:22.1018550Z Entering 'third_party/tensorpipe' 2025-12-04T13:45:22.1031673Z http.https://github.com/.extraheader 2025-12-04T13:45:22.1052552Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T13:45:22.1064957Z http.https://github.com/.extraheader 2025-12-04T13:45:22.1085946Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T13:45:22.1108617Z http.https://github.com/.extraheader 2025-12-04T13:45:22.1129040Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T13:45:22.1144173Z http.https://github.com/.extraheader 2025-12-04T13:45:22.1162205Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T13:45:22.1174810Z http.https://github.com/.extraheader 2025-12-04T13:45:22.1194418Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T13:45:22.1207247Z http.https://github.com/.extraheader 2025-12-04T13:45:22.1243147Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.1273594Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T13:45:22.1468389Z Entering 'android/libs/fbjni' 2025-12-04T13:45:22.1480176Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T13:45:22.1492053Z Entering 'third_party/FP16' 2025-12-04T13:45:22.1503708Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T13:45:22.1515664Z Entering 'third_party/FXdiv' 2025-12-04T13:45:22.1529377Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T13:45:22.1539142Z Entering 'third_party/NNPACK' 2025-12-04T13:45:22.1550095Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T13:45:22.1564731Z Entering 'third_party/NVTX' 2025-12-04T13:45:22.1577464Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T13:45:22.1592873Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T13:45:22.1603788Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T13:45:22.1613809Z Entering 'third_party/XNNPACK' 2025-12-04T13:45:22.1623752Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T13:45:22.1638754Z Entering 'third_party/aiter' 2025-12-04T13:45:22.1649130Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T13:45:22.1661513Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T13:45:22.1674289Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T13:45:22.1695681Z Entering 'third_party/benchmark' 2025-12-04T13:45:22.1706053Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T13:45:22.1714858Z Entering 'third_party/composable_kernel' 2025-12-04T13:45:22.1725051Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T13:45:22.1741125Z Entering 'third_party/cpp-httplib' 2025-12-04T13:45:22.1753299Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T13:45:22.1766099Z Entering 'third_party/cpuinfo' 2025-12-04T13:45:22.1780986Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T13:45:22.1791524Z Entering 'third_party/cudnn_frontend' 2025-12-04T13:45:22.1801147Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T13:45:22.1810750Z Entering 'third_party/cutlass' 2025-12-04T13:45:22.1820952Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T13:45:22.1841579Z Entering 'third_party/fbgemm' 2025-12-04T13:45:22.1853121Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T13:45:22.1864784Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T13:45:22.1877205Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T13:45:22.1887091Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T13:45:22.1897825Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T13:45:22.1917399Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T13:45:22.1929551Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T13:45:22.1938895Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T13:45:22.1948895Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T13:45:22.1963524Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T13:45:22.1974099Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T13:45:22.1984700Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T13:45:22.1994770Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T13:45:22.2007721Z Entering 'third_party/fbgemm/external/json' 2025-12-04T13:45:22.2019135Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T13:45:22.2033833Z Entering 'third_party/flash-attention' 2025-12-04T13:45:22.2044625Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T13:45:22.2056140Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T13:45:22.2071431Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T13:45:22.2087483Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T13:45:22.2100315Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T13:45:22.2117637Z Entering 'third_party/flatbuffers' 2025-12-04T13:45:22.2128148Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T13:45:22.2138579Z Entering 'third_party/fmt' 2025-12-04T13:45:22.2151274Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T13:45:22.2160557Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T13:45:22.2173727Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T13:45:22.2182868Z Entering 'third_party/gloo' 2025-12-04T13:45:22.2193613Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T13:45:22.2205185Z Entering 'third_party/googletest' 2025-12-04T13:45:22.2218178Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:22.2228290Z Entering 'third_party/ideep' 2025-12-04T13:45:22.2239924Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T13:45:22.2248989Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T13:45:22.2264289Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T13:45:22.2277166Z Entering 'third_party/ittapi' 2025-12-04T13:45:22.2288329Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T13:45:22.2299667Z Entering 'third_party/kineto' 2025-12-04T13:45:22.2310088Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T13:45:22.2320808Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T13:45:22.2330967Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T13:45:22.2341832Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T13:45:22.2359318Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T13:45:22.2369362Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T13:45:22.2380829Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T13:45:22.2391949Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T13:45:22.2403483Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T13:45:22.2415287Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T13:45:22.2426093Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T13:45:22.2436108Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T13:45:22.2450754Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T13:45:22.2462632Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T13:45:22.2475190Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T13:45:22.2483578Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T13:45:22.2495145Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:22.2506369Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T13:45:22.2520498Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T13:45:22.2534799Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T13:45:22.2547001Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T13:45:22.2557583Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T13:45:22.2574472Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T13:45:22.2584379Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:22.2595577Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T13:45:22.2610792Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:22.2622691Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T13:45:22.2638892Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T13:45:22.2649822Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T13:45:22.2659790Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T13:45:22.2670872Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T13:45:22.2687136Z Entering 'third_party/kleidiai' 2025-12-04T13:45:22.2702792Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T13:45:22.2712561Z Entering 'third_party/mimalloc' 2025-12-04T13:45:22.2727077Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T13:45:22.2737711Z Entering 'third_party/nlohmann' 2025-12-04T13:45:22.2749752Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T13:45:22.2758300Z Entering 'third_party/onnx' 2025-12-04T13:45:22.2773472Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T13:45:22.2790159Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T13:45:22.2800659Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T13:45:22.2812893Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T13:45:22.2825724Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T13:45:22.2838387Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T13:45:22.2851310Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T13:45:22.2866315Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T13:45:22.2878941Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:22.2887839Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T13:45:22.2899315Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T13:45:22.2908344Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T13:45:22.2918637Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T13:45:22.2928752Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T13:45:22.2940525Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T13:45:22.2948624Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T13:45:22.2959165Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T13:45:22.2972224Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T13:45:22.2984263Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T13:45:22.2994483Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:22.3004566Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T13:45:22.3017145Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:22.3028948Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T13:45:22.3042569Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T13:45:22.3052909Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T13:45:22.3074149Z Entering 'third_party/pocketfft' 2025-12-04T13:45:22.3085447Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T13:45:22.3094885Z Entering 'third_party/protobuf' 2025-12-04T13:45:22.3105542Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T13:45:22.3115783Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T13:45:22.3125986Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T13:45:22.3138302Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T13:45:22.3148594Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:22.3159890Z Entering 'third_party/psimd' 2025-12-04T13:45:22.3170188Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T13:45:22.3179026Z Entering 'third_party/pthreadpool' 2025-12-04T13:45:22.3188737Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T13:45:22.3198526Z Entering 'third_party/pybind11' 2025-12-04T13:45:22.3208786Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T13:45:22.3218232Z Entering 'third_party/python-peachpy' 2025-12-04T13:45:22.3229566Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T13:45:22.3238725Z Entering 'third_party/sleef' 2025-12-04T13:45:22.3250345Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T13:45:22.3260243Z Entering 'third_party/tensorpipe' 2025-12-04T13:45:22.3271162Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T13:45:22.3283769Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T13:45:22.3294338Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:22.3304382Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T13:45:22.3315873Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T13:45:22.3325013Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T13:45:22.3335150Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T13:45:22.3343106Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T13:45:22.3357397Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T13:45:22.3366493Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T13:45:22.3381520Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T13:45:22.3414439Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3437529Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3458072Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3478182Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3499961Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3522145Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3543152Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3560158Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3578477Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3596062Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3615243Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3630744Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3644574Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3658504Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3673025Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3686982Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3703542Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3723220Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3746854Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3767203Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3784438Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3802519Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3817503Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3837796Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3855110Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3875061Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3898744Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3907851Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3925455Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3940126Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3956473Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3970573Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.3984160Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4004871Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4020001Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4035652Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4052160Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4068235Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4083698Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4100773Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4116387Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4136621Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4156180Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4171277Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4187419Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4202665Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4217021Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4231959Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4246492Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4261954Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4276874Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4292405Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4307453Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4324037Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4340029Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4355882Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4372070Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4387967Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4403177Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4417672Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4433403Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4449241Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4466249Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4481343Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4496794Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4512352Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4528091Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4548612Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4565293Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4580682Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4595316Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4610770Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4625249Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4642463Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4658004Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4673634Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4688759Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4704538Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4719829Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4739593Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4755135Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.4867308Z Post job cleanup. 2025-12-04T13:45:22.5315623Z [command]/usr/bin/git version 2025-12-04T13:45:22.5345509Z git version 2.52.0 2025-12-04T13:45:22.5368186Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/870b31c0-910a-41a6-89c9-5e91022c1991/.gitconfig' 2025-12-04T13:45:22.5374690Z Temporarily overriding HOME='/home/runner/_work/_temp/870b31c0-910a-41a6-89c9-5e91022c1991' before making global git config changes 2025-12-04T13:45:22.5375022Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T13:45:22.5377380Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T13:45:22.5398355Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T13:45:22.5415229Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T13:45:22.5578438Z Entering 'android/libs/fbjni' 2025-12-04T13:45:22.5603108Z Entering 'third_party/FP16' 2025-12-04T13:45:22.5625009Z Entering 'third_party/FXdiv' 2025-12-04T13:45:22.5648202Z Entering 'third_party/NNPACK' 2025-12-04T13:45:22.5674405Z Entering 'third_party/NVTX' 2025-12-04T13:45:22.5708105Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T13:45:22.5746554Z Entering 'third_party/XNNPACK' 2025-12-04T13:45:22.5779030Z Entering 'third_party/aiter' 2025-12-04T13:45:22.5817274Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T13:45:22.5851925Z Entering 'third_party/benchmark' 2025-12-04T13:45:22.5877731Z Entering 'third_party/composable_kernel' 2025-12-04T13:45:22.5909207Z Entering 'third_party/cpp-httplib' 2025-12-04T13:45:22.5941206Z Entering 'third_party/cpuinfo' 2025-12-04T13:45:22.5970380Z Entering 'third_party/cudnn_frontend' 2025-12-04T13:45:22.5994083Z Entering 'third_party/cutlass' 2025-12-04T13:45:22.6021886Z Entering 'third_party/fbgemm' 2025-12-04T13:45:22.6049555Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T13:45:22.6073386Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T13:45:22.6096327Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T13:45:22.6118545Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T13:45:22.6148372Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T13:45:22.6172323Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T13:45:22.6195424Z Entering 'third_party/fbgemm/external/json' 2025-12-04T13:45:22.6221366Z Entering 'third_party/flash-attention' 2025-12-04T13:45:22.6246531Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T13:45:22.6275701Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T13:45:22.6303579Z Entering 'third_party/flatbuffers' 2025-12-04T13:45:22.6328589Z Entering 'third_party/fmt' 2025-12-04T13:45:22.6363520Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T13:45:22.6389076Z Entering 'third_party/gloo' 2025-12-04T13:45:22.6415432Z Entering 'third_party/googletest' 2025-12-04T13:45:22.6443896Z Entering 'third_party/ideep' 2025-12-04T13:45:22.6467785Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T13:45:22.6495439Z Entering 'third_party/ittapi' 2025-12-04T13:45:22.6520150Z Entering 'third_party/kineto' 2025-12-04T13:45:22.6541996Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T13:45:22.6568333Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T13:45:22.6595148Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T13:45:22.6620248Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T13:45:22.6645312Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T13:45:22.6669413Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T13:45:22.6694949Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T13:45:22.6717537Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T13:45:22.6744049Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T13:45:22.6770864Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T13:45:22.6793026Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T13:45:22.6814985Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:22.6841911Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:22.6870912Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T13:45:22.6894374Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T13:45:22.6920554Z Entering 'third_party/kleidiai' 2025-12-04T13:45:22.6945632Z Entering 'third_party/mimalloc' 2025-12-04T13:45:22.6969599Z Entering 'third_party/nlohmann' 2025-12-04T13:45:22.6993111Z Entering 'third_party/onnx' 2025-12-04T13:45:22.7022906Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T13:45:22.7048087Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T13:45:22.7072561Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T13:45:22.7102595Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T13:45:22.7125210Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T13:45:22.7149132Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T13:45:22.7183793Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T13:45:22.7210710Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T13:45:22.7233130Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T13:45:22.7254028Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:22.7280229Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:22.7304956Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T13:45:22.7334416Z Entering 'third_party/pocketfft' 2025-12-04T13:45:22.7358649Z Entering 'third_party/protobuf' 2025-12-04T13:45:22.7383934Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T13:45:22.7405240Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T13:45:22.7428464Z Entering 'third_party/psimd' 2025-12-04T13:45:22.7451567Z Entering 'third_party/pthreadpool' 2025-12-04T13:45:22.7474263Z Entering 'third_party/pybind11' 2025-12-04T13:45:22.7500261Z Entering 'third_party/python-peachpy' 2025-12-04T13:45:22.7524951Z Entering 'third_party/sleef' 2025-12-04T13:45:22.7548435Z Entering 'third_party/tensorpipe' 2025-12-04T13:45:22.7572521Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T13:45:22.7592042Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T13:45:22.7612475Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T13:45:22.7635633Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T13:45:22.7656050Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T13:45:22.7696836Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T13:45:22.7715739Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T13:45:22.7852606Z Entering 'android/libs/fbjni' 2025-12-04T13:45:22.7875267Z Entering 'third_party/FP16' 2025-12-04T13:45:22.7896528Z Entering 'third_party/FXdiv' 2025-12-04T13:45:22.7916862Z Entering 'third_party/NNPACK' 2025-12-04T13:45:22.7938466Z Entering 'third_party/NVTX' 2025-12-04T13:45:22.7960479Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T13:45:22.7982773Z Entering 'third_party/XNNPACK' 2025-12-04T13:45:22.8012716Z Entering 'third_party/aiter' 2025-12-04T13:45:22.8047604Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T13:45:22.8081282Z Entering 'third_party/benchmark' 2025-12-04T13:45:22.8109328Z Entering 'third_party/composable_kernel' 2025-12-04T13:45:22.8139202Z Entering 'third_party/cpp-httplib' 2025-12-04T13:45:22.8164924Z Entering 'third_party/cpuinfo' 2025-12-04T13:45:22.8186644Z Entering 'third_party/cudnn_frontend' 2025-12-04T13:45:22.8208827Z Entering 'third_party/cutlass' 2025-12-04T13:45:22.8239908Z Entering 'third_party/fbgemm' 2025-12-04T13:45:22.8265020Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T13:45:22.8291732Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T13:45:22.8324071Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T13:45:22.8349296Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T13:45:22.8378914Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T13:45:22.8404819Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T13:45:22.8429835Z Entering 'third_party/fbgemm/external/json' 2025-12-04T13:45:22.8458571Z Entering 'third_party/flash-attention' 2025-12-04T13:45:22.8486668Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T13:45:22.8513645Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T13:45:22.8542917Z Entering 'third_party/flatbuffers' 2025-12-04T13:45:22.8573605Z Entering 'third_party/fmt' 2025-12-04T13:45:22.8597619Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T13:45:22.8621003Z Entering 'third_party/gloo' 2025-12-04T13:45:22.8646351Z Entering 'third_party/googletest' 2025-12-04T13:45:22.8670068Z Entering 'third_party/ideep' 2025-12-04T13:45:22.8695362Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T13:45:22.8724633Z Entering 'third_party/ittapi' 2025-12-04T13:45:22.8755250Z Entering 'third_party/kineto' 2025-12-04T13:45:22.8785312Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T13:45:22.8810648Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T13:45:22.8836340Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T13:45:22.8859235Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T13:45:22.8881334Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T13:45:22.8900643Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T13:45:22.8926691Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T13:45:22.8947067Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T13:45:22.8966984Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T13:45:22.8989249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T13:45:22.9014260Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T13:45:22.9035589Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:22.9061069Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:22.9087507Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T13:45:22.9107599Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T13:45:22.9131764Z Entering 'third_party/kleidiai' 2025-12-04T13:45:22.9152978Z Entering 'third_party/mimalloc' 2025-12-04T13:45:22.9174527Z Entering 'third_party/nlohmann' 2025-12-04T13:45:22.9196602Z Entering 'third_party/onnx' 2025-12-04T13:45:22.9228356Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T13:45:22.9252490Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T13:45:22.9275021Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T13:45:22.9296936Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T13:45:22.9317606Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T13:45:22.9351485Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T13:45:22.9360306Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T13:45:22.9379459Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T13:45:22.9404637Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T13:45:22.9433428Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:22.9457252Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:22.9487950Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T13:45:22.9519350Z Entering 'third_party/pocketfft' 2025-12-04T13:45:22.9542622Z Entering 'third_party/protobuf' 2025-12-04T13:45:22.9566676Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T13:45:22.9591498Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T13:45:22.9617932Z Entering 'third_party/psimd' 2025-12-04T13:45:22.9640994Z Entering 'third_party/pthreadpool' 2025-12-04T13:45:22.9663886Z Entering 'third_party/pybind11' 2025-12-04T13:45:22.9685922Z Entering 'third_party/python-peachpy' 2025-12-04T13:45:22.9711919Z Entering 'third_party/sleef' 2025-12-04T13:45:22.9736545Z Entering 'third_party/tensorpipe' 2025-12-04T13:45:22.9765469Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T13:45:22.9789281Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T13:45:22.9817934Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T13:45:22.9847853Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T13:45:22.9874114Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T13:45:22.9918807Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:22.9943559Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T13:45:23.0107786Z Entering 'android/libs/fbjni' 2025-12-04T13:45:23.0121508Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T13:45:23.0131361Z Entering 'third_party/FP16' 2025-12-04T13:45:23.0143793Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T13:45:23.0152760Z Entering 'third_party/FXdiv' 2025-12-04T13:45:23.0162380Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T13:45:23.0172723Z Entering 'third_party/NNPACK' 2025-12-04T13:45:23.0183569Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T13:45:23.0194073Z Entering 'third_party/NVTX' 2025-12-04T13:45:23.0205962Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T13:45:23.0214507Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T13:45:23.0226112Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T13:45:23.0233270Z Entering 'third_party/XNNPACK' 2025-12-04T13:45:23.0246172Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T13:45:23.0259198Z Entering 'third_party/aiter' 2025-12-04T13:45:23.0271842Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T13:45:23.0279779Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T13:45:23.0293558Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T13:45:23.0307788Z Entering 'third_party/benchmark' 2025-12-04T13:45:23.0318747Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T13:45:23.0327143Z Entering 'third_party/composable_kernel' 2025-12-04T13:45:23.0337639Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T13:45:23.0351324Z Entering 'third_party/cpp-httplib' 2025-12-04T13:45:23.0361853Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T13:45:23.0370030Z Entering 'third_party/cpuinfo' 2025-12-04T13:45:23.0380376Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T13:45:23.0390192Z Entering 'third_party/cudnn_frontend' 2025-12-04T13:45:23.0402228Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T13:45:23.0410539Z Entering 'third_party/cutlass' 2025-12-04T13:45:23.0420928Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T13:45:23.0433763Z Entering 'third_party/fbgemm' 2025-12-04T13:45:23.0443764Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T13:45:23.0453877Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T13:45:23.0465550Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T13:45:23.0474847Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T13:45:23.0488106Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T13:45:23.0501247Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T13:45:23.0512479Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T13:45:23.0520886Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T13:45:23.0529863Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T13:45:23.0543012Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T13:45:23.0554043Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T13:45:23.0565323Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T13:45:23.0576834Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T13:45:23.0585258Z Entering 'third_party/fbgemm/external/json' 2025-12-04T13:45:23.0595958Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T13:45:23.0609127Z Entering 'third_party/flash-attention' 2025-12-04T13:45:23.0618789Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T13:45:23.0639134Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T13:45:23.0645769Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T13:45:23.0659843Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T13:45:23.0671538Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T13:45:23.0688210Z Entering 'third_party/flatbuffers' 2025-12-04T13:45:23.0700874Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T13:45:23.0716033Z Entering 'third_party/fmt' 2025-12-04T13:45:23.0727715Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T13:45:23.0736870Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T13:45:23.0748428Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T13:45:23.0757993Z Entering 'third_party/gloo' 2025-12-04T13:45:23.0769676Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T13:45:23.0779830Z Entering 'third_party/googletest' 2025-12-04T13:45:23.0790689Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:23.0799785Z Entering 'third_party/ideep' 2025-12-04T13:45:23.0810478Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T13:45:23.0821073Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T13:45:23.0830900Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T13:45:23.0843936Z Entering 'third_party/ittapi' 2025-12-04T13:45:23.0856817Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T13:45:23.0865699Z Entering 'third_party/kineto' 2025-12-04T13:45:23.0876964Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T13:45:23.0885770Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T13:45:23.0897178Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T13:45:23.0906062Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T13:45:23.0917105Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T13:45:23.0927795Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T13:45:23.0938335Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T13:45:23.0947521Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T13:45:23.0957817Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T13:45:23.0965789Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T13:45:23.0977899Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T13:45:23.0987772Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T13:45:23.0997714Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T13:45:23.1009262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T13:45:23.1020553Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T13:45:23.1034101Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T13:45:23.1043993Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:23.1053316Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T13:45:23.1063647Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T13:45:23.1073408Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T13:45:23.1083519Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T13:45:23.1092567Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T13:45:23.1102381Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T13:45:23.1111457Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:23.1120844Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T13:45:23.1131156Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:23.1141310Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T13:45:23.1155419Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T13:45:23.1166275Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T13:45:23.1175787Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T13:45:23.1187174Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T13:45:23.1198472Z Entering 'third_party/kleidiai' 2025-12-04T13:45:23.1212544Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T13:45:23.1222515Z Entering 'third_party/mimalloc' 2025-12-04T13:45:23.1233061Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T13:45:23.1242501Z Entering 'third_party/nlohmann' 2025-12-04T13:45:23.1262595Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T13:45:23.1267948Z Entering 'third_party/onnx' 2025-12-04T13:45:23.1278966Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T13:45:23.1294148Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T13:45:23.1306748Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T13:45:23.1320905Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T13:45:23.1332313Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T13:45:23.1343377Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T13:45:23.1354571Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T13:45:23.1364741Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T13:45:23.1376776Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:23.1386164Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T13:45:23.1397211Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T13:45:23.1406079Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T13:45:23.1417503Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T13:45:23.1427580Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T13:45:23.1438758Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T13:45:23.1448204Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T13:45:23.1460199Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T13:45:23.1470098Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T13:45:23.1481188Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T13:45:23.1490862Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T13:45:23.1501369Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T13:45:23.1512200Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T13:45:23.1523548Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T13:45:23.1534489Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T13:45:23.1545534Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T13:45:23.1564573Z Entering 'third_party/pocketfft' 2025-12-04T13:45:23.1578163Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T13:45:23.1586877Z Entering 'third_party/protobuf' 2025-12-04T13:45:23.1597300Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T13:45:23.1607780Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T13:45:23.1619180Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T13:45:23.1629355Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T13:45:23.1639971Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:23.1650479Z Entering 'third_party/psimd' 2025-12-04T13:45:23.1661789Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T13:45:23.1671660Z Entering 'third_party/pthreadpool' 2025-12-04T13:45:23.1683903Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T13:45:23.1694489Z Entering 'third_party/pybind11' 2025-12-04T13:45:23.1708611Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T13:45:23.1716687Z Entering 'third_party/python-peachpy' 2025-12-04T13:45:23.1729726Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T13:45:23.1738536Z Entering 'third_party/sleef' 2025-12-04T13:45:23.1748844Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T13:45:23.1763937Z Entering 'third_party/tensorpipe' 2025-12-04T13:45:23.1776503Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T13:45:23.1786234Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T13:45:23.1797336Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T13:45:23.1806850Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T13:45:23.1817345Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T13:45:23.1826237Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T13:45:23.1835686Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T13:45:23.1845214Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T13:45:23.1854523Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T13:45:23.1862942Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T13:45:23.1874480Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T13:45:23.1913940Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.1938874Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.1956327Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.1971301Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.1986056Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2002074Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2016337Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2031185Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2044827Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2060222Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2073841Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2087603Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2102161Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2117273Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2131139Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2146368Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2159666Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2173022Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2186678Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2199840Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2213388Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2229360Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2243373Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2256546Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2270704Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2283905Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2297776Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2311551Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2325236Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2339131Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2352584Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2364997Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2377290Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2390322Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2403661Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2416761Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2431798Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2445562Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2458595Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2472557Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2487439Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2502582Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2516769Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2536470Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2550369Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2565564Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2583653Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2609309Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2624060Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2644523Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2663442Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2678428Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2691509Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2704095Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2716856Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2730275Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2745911Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2764122Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2784792Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2804609Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2820537Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2839365Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2855337Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2868905Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2884540Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2902937Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2921429Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2938039Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2952157Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2965148Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2977849Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.2991169Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3004069Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3017215Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3032356Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3046053Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3061122Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3074444Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3088005Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3102873Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3117856Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T13:45:23.3216462Z Cleaning up orphan processes